SQL Server 2005 came up with a facility to extend the Non Clustered Index to cover another column and such column is called "Included Column". The column on which the index is created is called "Key Column" and including a non key column to it adds it to the leaf level of non clustered index. There is no restriction that key column and non key column of the index should be of same data type. It has also an advantage that including non key column does not increase the size occupied by index.
An included index increases performance with less I/O reads when all the columns included in the index is used in the SELECT list. If so, the optimizer never goes to clustered index thus decreasing I/O reads. Let's see a scenario of included columns.
Create a table and populate some records in it,
CREATE
TABLE
INCLUDE_COLUMN
(ID
INT,NAME
VARCHAR(100),CONSTRAINT
PK_INCLUDE_COLUMN PRIMARY
KEY
(ID))
GO
DECLARE
@ID INT
=
1
WHILE
(@ID
<=
1000)
BEGIN
INSERT
INTO
INCLUDE_COLUMN SELECT
@ID,'A_'+CAST(@ID
AS
varchar)
SET
@ID =
@ID +
1
END
GO
Turn on "Actual Execution Plan" and run the below query,
SELECT
*
FROM
INCLUDE_COLUMN
As known, optimizer uses clustered index created along with primary key. Now, add a column to the table and populate some values in it,
ALTER
TABLE
INCLUDE_COLUMN ADD
NAME1 VARCHAR(100)
GO
UPDATE
INCLUDE_COLUMN SET
NAME1 =
CAST(ID
AS
varchar)+NAME
Now create a non clustered index on the column "Name" and include the newly added "Name1" column in that index as follows,
CREATE
NONCLUSTERED
INDEX
IX_INCLUDE_COLUMN_NAME ON
INCLUDE_COLUMN
(NAME)
INCLUDE
(NAME1)
If we query the entire table now, optimizer may use clustered index or non clustered index depending on its feasibility, as stated in our previous article. But Books Online states that optimize does not go for clustered index only if all the included columns of non clustered index are used in SELECT list. Right now, we have all the columns covered in some indexes. So, to test the performance of included columns, let's add another column to our table.
ALTER
TABLE
INCLUDE_COLUMN ADD
SURNAME VARCHAR(100)
GO
UPDATE
INCLUDE_COLUMN SET
SURNAME =
NAME +
NAME1
GO
Now if the whole table is queried then clustered index is used for sure. Let's see in what scenario, clustered index is avoided.
SELECT
ID,NAME,NAME1
FROM
INCLUDE_COLUMN
This query goes for non clustered index as we have included all the indexed columns in our SELECT list. As it claims to have lesser I/O reads, let's force the optimizer to use our clustered index and compare the costs for both the queries.
SELECT
ID,NAME,NAME1
FROM
INCLUDE_COLUMN
GO
SELECT
ID,NAME,NAME1
FROM
INCLUDE_COLUMN WITH
(INDEX(PK_INCLUDE_COLUMN))
Compare the I/O cost, CPU Cost, Operator Cost and CPU Cost and you can notice that using Non Clustered Index incurs lesser cost than that of clustered index. To use clustered index, optimizer needs to access disk which requires more amount of CPU resources. So, include columns are introduced. This gives more performance when you have a huge amount of data in your table.
Remember one thing,
SQL Server Query Optimizer knows the best way of retrieve the data that gives good performance.
I feel SQL and other similar and useful aspects actually provide more and more easy to use and learn solutions.
ReplyDeleteSQL Server Load Soap API