Sunday, April 10, 2016

SQL Server - Usage of Indexes in Execution Plan

Indexes are used to have a better performance while retrieving data. I'm not going to refer basics in this article. Rather, I'd like to share my experiences with the well known indexes i.e., Clustered and Non Clustered Indexes and how they are used by SQL Server Query Optimizer in its execution plan.

As the performance of index can be observed when there is a huge amount of data, let's have that data using the following code,

CREATE DATABASE YASHWANTH
GO
USE YASHWANTH
GO
CREATE TABLE TEST (ID INT,NAME VARCHAR(100),CONSTRAINT PK_TEST_ID PRIMARY KEY (ID))
GO
DECLARE @ID INT = 1
WHILE (@ID <= 2000000)
BEGIN
INSERT INTO TEST SELECT @ID,'A_'+CAST(@ID AS varchar)
SET @ID = @ID + 1
END
GO

Now that, we have 20 lakhs of unique records in our table. As we created a table with a primary key column, a clustered index is created automatically. Let's do some operations on this table.

Turn on "Actual Execution Plan" and run the following query,

SELECT * FROM TEST

After the entire table result is retrieved, check the execution plan where we can find this,


As per our knowledge, query optimizer uses the primary key's clustered index and scans it using the operator "Clustered Index Scan". If a "where" clause is used then it uses the same index using "Clustered Index Seek" operator.

We know that among all indexes, Clustered is a high prioritized index and this is why it gets created automatically when a primary key is created on the table. Is it true in all the cases?

Let's create a Non Clustered Index on the column "Name" of our table. You can create a unique or non-unique or ascending or descending. Here, I'm creating a normal non-unique ascending non clustered index,

CREATE NONCLUSTERED INDEX IX_TEST_NAME ON TEST (NAME)

Now run the below query and observe the execution plan,

SELECT * FROM TEST


Though there is a high prioritized clustered index, optimizer used newly created Non Clustered index. We know that clustered index orders table data on the disk according to it. But why does optimizer go for non clustered index instead of clustered index.

Now, let's do some more operations to have our doubts cleared. Clear the procedure cache.

CHECKPOINT
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

We can force the optimizer to use our favourite clustered index instead of non clustered index. Let's try that now. Run the following query,

SELECT * FROM TEST WITH (INDEX(PK_TEST_ID))

My system is of I5 processor with 8GB of physical memory. It used clustered index and retrieved 20 lakh records within 11 seconds. Now run the below query,

SELECT * FROM TEST

It used its favourite non clustered index and retrieved the same result of 20 lakh records within 10 seconds in my system.

Why did this happen?

Because SQL Server is built with query optimization capability without the manual intervention. The optimizer knows it can get result faster when it uses non clustered index and hence it used it. Earlier, it had only clustered index to use and now it has two and an option to decide which can give better performance.

Still confused?

Take the following query,

SELECT ID FROM TEST

Now, it should definitely use the clustered index as we used only the column "ID" in our SELECT list. But execute it and see the result and execution plan,


In my system, I got the result in 9 seconds this time, performing "Non Clustered Index Scan".

Let's see the CPU usage when we run our previous queries individually,

SELECT * FROM TEST
GO
SELECT * FROM TEST WITH (INDEX(PK_TEST_ID))

Observe the Subtree Cost, I/O Cost and Operator Cost for both the queries where we can find that using Non Clustered Index incurred lesser costs. Now, run the both queries at once and see the execution plan to get the percentage wise performance stats,


Using clustered index used 52% of the total 100% execution plan for both the queries. Since SQL Server 2005, query optimizer knows how to use resources to have the better performance.

When used a "where" clause, optimizer uses "Clustered Index Seek" operator for the below first two queries and "Non Clustered Index Seek" operator for the third query,

SELECT * FROM TEST WHERE ID = 1 AND NAME = 'A_1'
GO
SELECT * FROM TEST WHERE NAME = 'A_1' AND ID = 1
GO
SELECT * FROM TEST WHERE NAME = 'A_1'

Now let's do another observation on usage of indexes in execution plan by adding a new column to the table,

ALTER TABLE TEST ADD SURNAME VARCHAR(100)

Now, we have no index on the column "SURNAME". Let's run the below query,

SELECT * FROM TEST


Clustered Index Scan is performed because we are querying all the columns out of which "SURNAME" has no index and optimizer has to rely on clustered index to retrieve its data. Here are some more observations,

SELECT ID,NAME FROM TEST -- PERFORMS NON CLUSTERED INDEX SCAN
GO
SELECT ID,SURNAME FROM TEST -- PERFORMS CLUSTERED INDEX SCAN
GO
SELECT NAME,SURNAME FROM TEST -- PERFORMS CLUSTERED INDEX SCAN
GO
SELECT * FROM TEST WHERE NAME = 'A_1' -- PERFORMS NON CLUSTERED INDEX SEEK AND KEY LOOKUP IN CLUSTERED INDEX FOR "SURNAME" COLUMN
GO
SELECT NAME,SURNAME FROM TEST WHERE NAME = 'A_1' -- PERFORMS NON CLUSTERED INDEX SEEK AND KEY LOOKUP IN CLUSTERED INDEX FOR "SURNAME" COLUMN
GO
SELECT SURNAME FROM TEST -- PERFORMS CLUSTERED INDEX SCAN
GO
SELECT ID FROM TEST -- PERFORMS NON CLUSTERED INDEX SCAN
GO
SELECT NAME FROM TEST WHERE ID = 1 -- PERFORMS CLUSTERED INDEX SEEK
GO
SELECT SURNAME FROM TEST WHERE NAME = 'A_1' -- PERFORMS NON CLUSTERED INDEX SEEK AND KEY LOOKUP IN CLUSTERED INDEX FOR "SURNAME" COLUMN

Try any other combinations I missed here and let me know through your comments.

Now, we observed when the optimizer is going for Clustered and when for Non Clustered. There is one more index called "Include Index" in used terminology which is called "Include Column" technically. It doesn't let optimizer go for Clustered Index hence reducing disk I/O and increasing query performance. I will write about it in my next article.

Now it's time for clean up,

DROP TABLE TEST
GO
USE master
GO
DROP DATABASE YASHWANTH

1 comment:

  1. I feel SQL server is the best and most useful application which help in solving database problems up to a large extent.

    SQL Server Load Soap API

    ReplyDelete