Indexes are used to have a better performance while retrieving data. I'm not going to refer basics in this article. Rather, I'd like to share my experiences with the well known indexes i.e., Clustered and Non Clustered Indexes and how they are used by SQL Server Query Optimizer in its execution plan.
As the performance of index can be observed when there is a huge amount of data, let's have that data using the following code,
CREATE
DATABASE
YASHWANTH
GO
USE
YASHWANTH
GO
CREATE
TABLE
TEST
(ID
INT,NAME
VARCHAR(100),CONSTRAINT
PK_TEST_ID PRIMARY
KEY
(ID))
GO
DECLARE
@ID INT
=
1
WHILE
(@ID
<=
2000000)
BEGIN
INSERT
INTO
TEST SELECT
@ID,'A_'+CAST(@ID
AS
varchar)
SET
@ID =
@ID +
1
END
GO
Now that, we have 20 lakhs of unique records in our table. As we created a table with a primary key column, a clustered index is created automatically. Let's do some operations on this table.
Turn on "Actual Execution Plan" and run the following query,
SELECT
*
FROM
TEST
After the entire table result is retrieved, check the execution plan where we can find this,
As per our knowledge, query optimizer uses the primary key's clustered index and scans it using the operator "Clustered Index Scan". If a "where" clause is used then it uses the same index using "Clustered Index Seek" operator.
We know that among all indexes, Clustered is a high prioritized index and this is why it gets created automatically when a primary key is created on the table. Is it true in all the cases?
Let's create a Non Clustered Index on the column "Name" of our table. You can create a unique or non-unique or ascending or descending. Here, I'm creating a normal non-unique ascending non clustered index,
CREATE
NONCLUSTERED
INDEX
IX_TEST_NAME ON
TEST
(NAME)
Now run the below query and observe the execution plan,
SELECT
*
FROM
TEST
Though there is a high prioritized clustered index, optimizer used newly created Non Clustered index. We know that clustered index orders table data on the disk according to it. But why does optimizer go for non clustered index instead of clustered index.
Now, let's do some more operations to have our doubts cleared. Clear the procedure cache.
CHECKPOINT
DBCC
FREEPROCCACHE
DBCC
DROPCLEANBUFFERS
We can force the optimizer to use our favourite clustered index instead of non clustered index. Let's try that now. Run the following query,
SELECT
*
FROM
TEST WITH
(INDEX(PK_TEST_ID))
My system is of I5 processor with 8GB of physical memory. It used clustered index and retrieved 20 lakh records within 11 seconds. Now run the below query,
SELECT
*
FROM
TEST
It used its favourite non clustered index and retrieved the same result of 20 lakh records within 10 seconds in my system.
Why did this happen?
Because SQL Server is built with query optimization capability without the manual intervention. The optimizer knows it can get result faster when it uses non clustered index and hence it used it. Earlier, it had only clustered index to use and now it has two and an option to decide which can give better performance.
Still confused?
Take the following query,
SELECT
ID FROM
TEST
Now, it should definitely use the clustered index as we used only the column "ID" in our SELECT list. But execute it and see the result and execution plan,
In my system, I got the result in 9 seconds this time, performing "Non Clustered Index Scan".
Let's see the CPU usage when we run our previous queries individually,
SELECT
*
FROM
TEST
GO
SELECT
*
FROM
TEST WITH
(INDEX(PK_TEST_ID))
Observe the Subtree Cost, I/O Cost and Operator Cost for both the queries where we can find that using Non Clustered Index incurred lesser costs. Now, run the both queries at once and see the execution plan to get the percentage wise performance stats,
Using clustered index used 52% of the total 100% execution plan for both the queries. Since SQL Server 2005, query optimizer knows how to use resources to have the better performance.
When used a "where" clause, optimizer uses "Clustered Index Seek" operator for the below first two queries and "Non Clustered Index Seek" operator for the third query,
SELECT
*
FROM
TEST WHERE
ID =
1 AND
NAME =
'A_1'
GO
SELECT
*
FROM
TEST WHERE
NAME =
'A_1'
AND
ID =
1
GO
SELECT
*
FROM
TEST WHERE
NAME =
'A_1'
Now let's do another observation on usage of indexes in execution plan by adding a new column to the table,
ALTER
TABLE
TEST ADD
SURNAME VARCHAR(100)
Now, we have no index on the column "SURNAME". Let's run the below query,
SELECT
*
FROM
TEST
Clustered Index Scan is performed because we are querying all the columns out of which "SURNAME" has no index and optimizer has to rely on clustered index to retrieve its data. Here are some more observations,
SELECT
ID,NAME
FROM
TEST --
PERFORMS NON CLUSTERED INDEX SCAN
GO
SELECT
ID,SURNAME
FROM
TEST --
PERFORMS CLUSTERED INDEX SCAN
GO
SELECT
NAME,SURNAME
FROM
TEST --
PERFORMS CLUSTERED INDEX SCAN
GO
SELECT
*
FROM
TEST WHERE
NAME =
'A_1'
--
PERFORMS NON CLUSTERED INDEX SEEK AND KEY LOOKUP IN CLUSTERED INDEX
FOR "SURNAME" COLUMN
GO
SELECT
NAME,SURNAME
FROM
TEST WHERE
NAME =
'A_1'
--
PERFORMS NON CLUSTERED INDEX SEEK AND KEY LOOKUP IN CLUSTERED INDEX
FOR "SURNAME" COLUMN
GO
SELECT
SURNAME FROM
TEST --
PERFORMS CLUSTERED INDEX SCAN
GO
SELECT
ID FROM
TEST --
PERFORMS NON CLUSTERED INDEX SCAN
GO
SELECT
NAME FROM
TEST WHERE
ID =
1 --
PERFORMS CLUSTERED INDEX SEEK
GO
SELECT
SURNAME FROM
TEST WHERE
NAME =
'A_1'
--
PERFORMS NON CLUSTERED INDEX SEEK AND KEY LOOKUP IN CLUSTERED INDEX
FOR "SURNAME" COLUMN
Try any other combinations I missed here and let me know through your comments.
Now, we observed when the optimizer is going for Clustered and when for Non Clustered. There is one more index called "Include Index" in used terminology which is called "Include Column" technically. It doesn't let optimizer go for Clustered Index hence reducing disk I/O and increasing query performance. I will write about it in my next article.
Now it's time for clean up,
DROP
TABLE
TEST
GO
USE
master
GO
DROP
DATABASE
YASHWANTH
I feel SQL server is the best and most useful application which help in solving database problems up to a large extent.
ReplyDeleteSQL Server Load Soap API