Techie-Friendly Blog: February 2014

Thursday, February 27, 2014

Partitioning in SQL Server - Part 2

Please read my previous post Partitioning in SQL Server - Part 1 before reading this post.

In my earlier post, I've discussed about theoretical concept of Partitioning. In this post I will speak about how partitioning is done. Remember the following procedure I'm going to show is available in SQL Server Enterprise.

Let's try different types of tables. To make use of partition, it would be good if the table has huge amount of data. So let's do the same.

USE AdventureWorksDW2012

CREATE TABLE MySampleTable (ID INT PRIMARY KEY, CREATE_DATE DATETIME)

INSERT MySampleTable

SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY O.NAME),O1.create_date

FROM sys.all_objects O

CROSS JOIN sys.all_objects O1

Now 1000000 records are inserted in our table. Partition can be made in different cases. Now I'll show how it is done using SQL Server Management Studio (SSMS).

Step - 1

Before partitioning a table, make sure that files exist for partitioning. There is no need to have the partitions distributed among different file groups but it is recommended because individual maintenance can be taken so that the other file will not get disturbed due to maintenance. So the first step that should be taken is creating a file in a different file group. Let's do that.

Go to Object Explorer and right click on the database in which the table that has to be partitioned exists. Go to Properties and then Files. There you can see the data file and log file of the database. Now click on Add button which adds a row below them.

Give a name for the file under the section Logical Name. Scroll to the right where you find a section Path. There give a path which is different from the drive in which SQL Server is present. You can give a name for the file and scroll back to the left. Under Filegroup section, select <new-filegroup> option in the drop-down which leads you to a new window as shown below.

Specify a name for the new file group. Below are two options. If you select Read-Only then the data in that file group cannot be updated. It can be just read. If you select Default then any data inserted into the table resides in this file group. I will write about these file groups in a different post. For now remember these points. Click on OK after giving the name and then click OK on the previous dialog box which creates a new file group on your disk.

Step - 2

Go to Object Explorer and find the table created. Right Click on it and there exists an option STORAGE. Expand it and click on Create Partition.

Step - 3

A Partition Wizard appears, click on NEXT which then guides you to Select a Partition Column. There select a column. To demonstrate a different kind of partition, choose the one which has datetime datatype.

There are two options with check boxes below. If you check the first option, you can use any existing partition scheme for this table. If you check the second option, all the indexes present on the table are partitioned with the same scheme the present table is being partitioned. For now, I'm skipping both the options.

Click on Next button to go to next step.

Step - 4

The next step is Select a Partition Function,

If you want to create a New Partition Function, type a name for it or else if there is any table partitioned already and you feel that it would be good to use that, click on Existing Partition Function and then select one.

Click on Next after selecting a Partition Function.

Step - 5

The next step is Select a Partition Scheme. The same case as above is with this. Follow the same criteria specified above.

Click on Next button for next step.

Step - 6

Here comes the crucial phase which decides the future of your table, Map Partitions. As we are partitioning on datetime column, the option Set Boundaries is highlighted. Choose a range, Left Boundary or Right Boundary and then click on Set Boundaries button. It opens a new dialog box as below,

As it is a datetime column, SQL Server sets default boundaries with the minimum value (Start Date) and maximum (End Date) value of that column. Based on that, you can select Date Range among the provided five options. Now click on OK in the Set Boundary Values dialog box and go the previous dialog box.

You can find that SQL Server specified the partitions with Boundary Values, Rowcount and Required Space. Now specify the file groups from the drop-down list. You can also find the new file group created, in that list. Select file groups for all the partitions and click on Estimate Storage which gives you the total storage capacity required for the partitions.

In my earlier post on Partition, I specified a case where one file group is specified more. You can see that practically now in the above screenshot.

Click on Next button now.

Step - 7

Now the final step is arrived where you have some options as shown in the below screenshot.

You can create T-SQL script for the above procedure.
You can Run Immediately the above procedure which creates partitions immediately.
You can schedule the partitioning later by specifying a desired date and time.
You can save the script to a SQL file or you can copy the script to Clipboard or a New Query Window.

I selected the default options to explain how partitioning is done. Click on Finish button twice which creates the following T-SQL script for me in a new query window.

USE [AdventureWorksDW2012]

BEGIN TRANSACTION

CREATE PARTITION FUNCTION [MySampleTable_PF](datetime) AS RANGE LEFT FOR VALUES (N'2003-04-08T00:00:00',

N'2004-04-08T00:00:00',

N'2005-04-08T00:00:00',

N'2006-04-08T00:00:00',

N'2007-04-08T00:00:00',

N'2008-04-08T00:00:00',

N'2014-04-08T00:00:00')

CREATE PARTITION SCHEME [MySampleTable_PS]

AS PARTITION [MySampleTable_PF]

([PRIMARY], [PRIMARY], [PRIMARY], [New_Filegroup], [New_Filegroup], [New_Filegroup], [New_Filegroup])

ALTER TABLE [dbo].[MySampleTable] DROP CONSTRAINT [PK__MySample__3214EC27650F907D]

ALTER TABLE [dbo].[MySampleTable] ADD PRIMARY KEY NONCLUSTERED

(

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

CREATE CLUSTERED INDEX [ClusteredIndex_on_MySampleTable_PS_635291384639364538] ON [dbo].[MySampleTable]

(

[CREATE_DATE]

)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [MySampleTable_PS]([CREATE_DATE])

DROP INDEX [ClusteredIndex_on_MySampleTable_PS_635291384639364538] ON [dbo].[MySampleTable]

COMMIT TRANSACTION

In the above T-SQL script, half of the script was executed as we discussed before. Now the remaining half script is important to discuss.

The existing Primary Key on the column ID is dropped because Primary Key creates a Unique Clustered Index on the table which is responsible for the physical storage of data on the disk. Here the table is being divided into parts which affects the physical storage, so it is dropped.
Later the Primary Key is created as Non-Clustered.
Then a Clustered Index is created on the partitioning datetime column. The reason for this is, one of the Microsoft recommended strategies that Partitioning column should be a Clustered Indexed Column as it affects the physical storage of the data. As only one clustered index can exist per table, SQL Server created the Primary Key as Non-Clustered in the above step.
Finally the Clustered Index created on datetime column is dropped because it is not the criteria of original table's schema. If we had created clustered index on this column while creating the table the index wouldn't have dropped or had we made partition on the column ID, Primary Key is dropped and later created as Primary Key Clustered.

Run the above T-SQL script, Partitioning is done and it can be checked with the below query,

SELECT * FROM sys.partitions WHERE object_id = OBJECT_ID('MySampleTable')

This is how Partitioning is done using SSMS. As partitioning on a datetime column is a peculiar case, I've written about that. Please try different cases of partitioning so that you will get a better grip on the concept.

Read the official documentation of Microsoft on Partitioning in the below link,

http://technet.microsoft.com/en-us/library/ms188730.aspx

Sunday, February 23, 2014

Partitioning in SQL Server - Part 1

Partitioning in SQL Server is one of the interesting concepts and also a wonderful concept. It is available only in the Enterprise Edition of SQL Server. In this article, I will narrate my experience with Partitioning.

Partitioning means distributing a table across different file groups. These file groups can be present on the same disk or different disks. Generally Partition is done on the table which has large number of records in order to have the faster retrieval of data and facilitate easier maintenance of index. To have a better performance due to partitioning, it is recommended to have the file groups on different disks. To search a record, Query Optimizer goes to the respective partition instead of scanning the whole table.

Partitioning can be done on an existing table or a table can be created with partitions at the creation time. To master this concept, one should be aware of the following components:

Partition Function.
Partition Scheme.
Partition Column.
Aligned Index.
Non-Aligned Index.
Partition Elimination.

Let's know about the above specified components in order,

Partition Function

Creating a Partition Function is the first task a user has to do for partitioning a table. It specifies how many partitions a table can have, with what kind of boundary and also specifies the boundary values.

CREATE PARTITION FUNCTION SAMPLE_PF (INT)

AS RANGE LEFT/RIGHT FOR VALUES (500000,1000000)

In the above T-SQL code, a partition function is created for an INT datatype. There are two values specified namely 500000 and 1000000 which are called Boundary Values. As two values are specified, this means there will be partitions. Two ranges can be specified like LEFT and RIGHT. I have written the both to specify there are two. If LEFT range is specified then the value for INT datatype column <= 500000 reside in first partition and the value for INT datatype <= 1000000 reside in second partition. If RIGHT range is specified then the value for INT datatype column < 500000 resides in first partition which doesn't include 500000. The value for INT datatype column < 1000000 which doesn't include 1000000.

Partition Scheme

After creating a Partition Function, partition scheme has to be created. This is an important part of partition concept. Here the file groups across which the table has to be distributed is specified. While creating Partition Scheme, the Partition Function which is to be used should be specified. It maps the partition function to the specified file groups.

For the above partition column, the partition scheme can be created as follows,

CREATE PARTITION SCHEME SAMPLE_PS

AS PARTITION SAMPLE_PF

TO ([PRIMARY],[Secondary_FileGroup],[PRIMARY])

Here, the reason for specifying three file groups though two boundary values are specified is, specifying two boundary values only. Let me be more clear on this. In the above partition function, we specified two boundary values 500000 and 1000000. So the Query Optimizer thinks there exist some values <= 500000 if it is a left boundary or <500000 if it is right boundary. Hence it needs one file group to be specified. The next boundary value is 1000000 and the same case will be with Query optimizer. Next, though one more boundary is not specified, Query Optimizer thinks there may be some values beyond 1000000. We know that there doesn't exist any value beyond this value but Query Optimizer doesn't. So it asks to specify one more file group. If there are any rows beyond 1000000 then rowcount will be more than '1' or else '1'. I will explain about this in my next article where I will show you practically with examples. If only two file groups are specified in the above Partition Scheme then SQL Server throws an error and insists you to specify one more file group.

Partition Column

Next comes, choosing a column on which partition has to be made. Choosing a Partition Columns is a critical choice because once partition is done based on a column it cannot be changed. If it has to be changed, the table should be dropped and re-created and all the indexes should be re-built. There are some strategies to be followed for a Partition Column:

A partitioning column should not have columns with data types like text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max).
It should be a part of Primary Key because a Primary Key doesn't allow NULL values and facilitates uniqueness of data. If there are NULL values in Partition Column then that record is stored in the left most partition.
It should be a Clustered Indexed column. This is because Clustered Index sorts the physical storage order of the rows. As partition effects the physical storage of table, it is recommended to have a Clustered Indexed column as the Partition Column.
A Partition Column can be a Unique Indexed Column. Like Primary Key it also ensures uniqueness of data but allows a single NULL value. So this strategy is the last recommended one after above two strategies.
Partition Column should be a single column. This means partition cannot be done on two different columns. If a combination of columns can make the best partition then they both should be declared as PERSISTED computed column.
To let the Query Optimizer go into a correct partition for data retrieval, it should be used in the predicate of queries.

Aligned Index

If a table is partitioned it has a Partition Function as well as Partition Scheme. If there is an index on the table and it is also partitioned with the same partition scheme then such index is called Aligned Index.

Non-Aligned Index

If the table and the index have different partition schemes then such index is called Non-Aligned Index.

Partition Elimination

This is one of the advantages with Partitioning. As said earlier, it is better if the partition column is chosen in the query predicate. This facilitates Partition Elimination. It means, when the partition column is used in the query predicate, Query Optimizer goes to the partition where the particular record exists. Therefore all the remaining partitions are removed from its execution plan. Hence query performance increases. So it is always recommended to choose a partition column which can be used in query predicate often.

These are the concepts one should be aware of, while knowing about partitioning. Read the continuation post about partitioning at Partitioning in SQL Server - Part 2

Monday, February 17, 2014

SQL Server - What if the default database is taken OFFLINE or DETACHED?

Sometimes DBAs don't find any work and start poking SQL Server by doing some naughty operations on it. Out of them I've gone through a stupid work.

I connect to my SQL Server instance through default system administrator account SA. By default, the default database is master. But being lazy and to save database switching time I changed its default database to one of my user defined databases. But accidentally I have taken that database OFFLINE. Then my mind was blown away as I wasn't able to connect to my instance and unfortunately I didn't have any other login to access the server.

Then I contacted my senior DBAs and got a solution. Let me show you practically,

As I said, the default database for any login is Master by default. Log on to your instance using SA as we have discussed about it earlier in this post.

EXEC sp_defaultdb 'sa','AdventureWorksDW2012'

Executing the above system stored procedure changes the default database for the login SA from master to AdventureWorksDW2012. Now let's carry on with our experiment by taking it OFFLINE.

ALTER DATABASE AdventureWorksDW2012 SET OFFLINE

Now close the SQL Server Management Studio and re-open it. Login with the credentials of 'sa'. You will get the following error.

Now, resolving this error is a simple task. Just follow the following steps:

Connect to Database Engine.
When it prompts for user authentication, provide the details like Server Name, Authentication Mode, Username and Password.
Now click on Options which expands the connection dialog box.
At the top, you can find a tab by name Connection Properties. Click on that.
In the text box CONNECT TO DATABASE, type master or any other database name you know, existing in the respective server.
Now you can login using 'sa'. Then go to user properties and change the name of the default database.

That's all, your connection to the server is restored. Change the default database as soon as you log on.

Techie-Friendly Blog