Sunday, April 10, 2016

SQL Server - Indexes with Included Columns

SQL Server 2005 came up with a facility to extend the Non Clustered Index to cover another column and such column is called "Included Column". The column on which the index is created is called "Key Column" and including a non key column to it adds it to the leaf level of non clustered index. There is no restriction that key column and non key column of the index should be of same data type. It has also an advantage that including non key column does not increase the size occupied by index.

An included index increases performance with less I/O reads when all the columns included in the index is used in the SELECT list. If so, the optimizer never goes to clustered index thus decreasing I/O reads. Let's see a scenario of included columns.

Create a table and populate some records in it,

CREATE TABLE INCLUDE_COLUMN (ID INT,NAME VARCHAR(100),CONSTRAINT PK_INCLUDE_COLUMN PRIMARY KEY (ID))
GO
DECLARE @ID INT = 1
WHILE (@ID <= 1000)
BEGIN
INSERT INTO INCLUDE_COLUMN SELECT @ID,'A_'+CAST(@ID AS varchar)
SET @ID = @ID + 1
END
GO

Turn on "Actual Execution Plan" and run the below query,

SELECT * FROM INCLUDE_COLUMN

As known, optimizer uses clustered index created along with primary key. Now, add a column to the table and populate some values in it,

ALTER TABLE INCLUDE_COLUMN ADD NAME1 VARCHAR(100)
GO
UPDATE INCLUDE_COLUMN SET NAME1 = CAST(ID AS varchar)+NAME

Now create a non clustered index on the column "Name" and include the newly added "Name1" column in that index as follows,

CREATE NONCLUSTERED INDEX IX_INCLUDE_COLUMN_NAME ON INCLUDE_COLUMN (NAME) INCLUDE (NAME1)

If we query the entire table now, optimizer may use clustered index or non clustered index depending on its feasibility, as stated in our previous article. But Books Online states that optimize does not go for clustered index only if all the included columns of non clustered index are used in SELECT list. Right now, we have all the columns covered in some indexes. So, to test the performance of included columns, let's add another column to our table.

ALTER TABLE INCLUDE_COLUMN ADD SURNAME VARCHAR(100)
GO
UPDATE INCLUDE_COLUMN SET SURNAME = NAME + NAME1
GO

Now if the whole table is queried then clustered index is used for sure. Let's see in what scenario, clustered index is avoided.

SELECT ID,NAME,NAME1 FROM INCLUDE_COLUMN

This query goes for non clustered index as we have included all the indexed columns in our SELECT list. As it claims to have lesser I/O reads, let's force the optimizer to use our clustered index and compare the costs for both the queries.

SELECT ID,NAME,NAME1 FROM INCLUDE_COLUMN
GO
SELECT ID,NAME,NAME1 FROM INCLUDE_COLUMN WITH (INDEX(PK_INCLUDE_COLUMN))

Compare the I/O cost, CPU Cost, Operator Cost and CPU Cost and you can notice that using Non Clustered Index incurs lesser cost than that of clustered index. To use clustered index, optimizer needs to access disk which requires more amount of CPU resources. So, include columns are introduced. This gives more performance when you have a huge amount of data in your table.

Remember one thing,

SQL Server Query Optimizer knows the best way of retrieve the data that gives good performance.

1 comment:

  1. I feel SQL and other similar and useful aspects actually provide more and more easy to use and learn solutions.

    SQL Server Load Soap API

    ReplyDelete