Sunday, August 2, 2015

Hash Partitioning in MySQL

Hash Partitioning is a special type of partitioning which ensures even distribution of data among all created partitions. This should be done explicitly while creating table in other partitioning types like "Range" and "List" partitions.

Hash Partitioning is of two kinds,
  • Regular Hash Partitioning.
  • Linear Hash Partitioning.

In this article, we'll see each partitioning method in detail,

Regular Hash Partitioning


Let's create a table with Hash Partitioning,

CREATE TABLE employees (
     id INT NOT NULL,
     fname VARCHAR(30),
     lname VARCHAR(30),
     hired DATE NOT NULL DEFAULT '1970-01-01',
     separated DATE NOT NULL DEFAULT '9999-12-31',
     job_code INT,
     store_id INT
)
PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;

This statement creates a tables with four partitions namely p0,p1,p2 and p3. This can be noticed by running the below query against "information_schema",

SELECT TABLE_SCHEMA,
       TABLE_NAME,
       PARTITION_NAME,
       PARTITION_ORDINAL_POSITION,
       PARTITION_EXPRESSION,
       TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = 'yashwanth' AND TABLE_NAME = 'employees';

The names of partitions cannot be changed as it is decided by MySQL. Now let's insert some rows into it to see how Hash Partitioning works,

INSERT INTO employees VALUES
(1,'Yashwanth','Aluru','2013-09-11','2015-09-11',1,1),
(2,'Yashwanth','Aluru','2014-09-11','2015-09-11',1,1),
(3,'Yashwanth','Aluru','2015-09-11','2015-09-11',1,1);

Observe that we partitioned the table based on the Year part of "hired" column. So we inserted rows with three different years. To see how these rows are inserted into different partitions, run the above query against "information_schema" again,









Three partitions p1,p2 and p3 has one row each. To allocate like this, MySQL has a calculation which is modular function. Suppose the expression defined for partition is "expr" and the number of partitions made is "n" then the partition in which the inserted row is allocated can be calculated as,

N = mod(expr,n)

Here,

For Row - 1, N = mod (2013,4) = 1
For Row - 2, N = mod (2014,4) = 2
For Row - 3, N = mod (2015,4) = 3

The values 1,2,3 does not imply "Partition Ordinal Position" obtained from above query. Instead it implies partition name p(n). This means the row with year 2013 is allocated in p1, 2014 in p2 and 2015 in p3.

This is how Hash Partition works. We've seen how to create a table with Hash Partition. To partition an existing table with Hash Partition,

ALTER TABLE employees PARTITION BY HASH(YEAR(hired)) PARTITIONS 4;

Linear Hash Partitioning


Linear Hash Partitioning differs from Regular Hash Partitioning in the algorithm used for allocating Partition for inserted rows. As Regular Hash Partitioning uses modular function, this uses "Linear Power of Two" Algorithm.

To demonstrate this, I'm using the same "employees" table used above,

CREATE TABLE employees (
     id INT NOT NULL,
     fname VARCHAR(30),
     lname VARCHAR(30),
     hired DATE NOT NULL DEFAULT '1970-01-01',
     separated DATE NOT NULL DEFAULT '9999-12-31',
     job_code INT,
     store_id INT
)
PARTITION BY LINEAR HASH( YEAR(hired) )
PARTITIONS 4;

Insert some rows in it,

INSERT INTO employees VALUES
(1,'Yashwanth','Aluru','2013-09-11','2015-09-11',1,1),
(2,'Yashwanth','Aluru','2014-09-11','2015-09-11',1,1),
(3,'Yashwanth','Aluru','2015-09-11','2015-09-11',1,1);

Run the following query,

SELECT TABLE_SCHEMA,
       TABLE_NAME,
       PARTITION_NAME,
       PARTITION_ORDINAL_POSITION,
       PARTITION_EXPRESSION,
       TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = 'yashwanth' AND TABLE_NAME = 'employees';


As said above, Linear Hash Partitioning applies "Linear Power of Two" algorithm. Let's see how it works,
  • Suppose the partition expression is "expr" and number of partitions made to the table is "n".
  • Now, let the partition in which a record is inserted be "P".
  • Find the next power of 2 greater than num. We call this value V; it can be calculated as:
          • V = POWER(2, CEILING(LOG(2, num)))
  • Set N = F(column_list) & (V- 1).
  • While N >= num:
    • Set V = CEIL(V/ 2)
    • Set N = N & (V- 1)
  • Now, the record inserted has year "2015" based on which the partition allocated for these records out of 4 partitions made, can be followed,
  • Here n=4, hence V = power(2,ceiling(log(2,4))) => V = 4.
  • N = 2015 & (4-1) => N = 2015 & 3 => N = 3.
  • Here the expression 3>= 4 is false. So, no need of entering while loop which means the records with year (hired) "2015" are allocated in partition p3.

This is how Linear Hash Partitioning works. To alter an existing table with Linear Hash Partitioning,

ALTER TABLE employees PARTITION BY LINEAR HASH(id) PARTITIONS 5;

Some Operations on Hash Partitioning


Coalesce Partitions


Hash Partitioning supports coalescing the partitions. This means, if there are 4 partitions for a table and you want to decrease the number by 2 then you can apply this operation on the table. Here is the command,

ALTER TABLE employee COALESCE PARTITION 1;

With this command, the number of partitions in the table "employees" are decreased by 1. If there are any rows present in that partition then they are merged into the remaining available partitions. Partitions are decreased from the last partition. You cannot delete a particular partition.

This coalescing applies to Hash and Key Partitioning only because in these partitioning methods, the partitions are created and rows are allocated by MySQL whereas in other partitioning methods, user defines allocating partitions to rows explicitly.

Advantages of Hash Partitioning

  • Distribution of data is even among all partitions without the intervention of user.
  • Partitions can be coalesced to decrease number of files.
  • Can be used in Sub-Partitioning.
  • No need of defining a primary key on the partition expression column.

Limitations of Hash Partitioning

  • Partitioning expression should represent only integer values. Other types of values cannot be present. For example, you cannot Hash Partition a table on a VARCHAR column.
  • Individual partitions cannot be dropped.
  • Merge, CSV, NDB Cluster, Federated storage engines do not support Hash Partitioning.
  • Two columns cannot be used in partition expression. Instead the sum of two column values can be hashed in Hash Partitioning. For example, PARTITION BY HASH (id+year(hired)) is valid in the CREATE statement of above table "employees". In such case, Hashing is applied to the sum of both the column values with the number of partitions.
For more information on Hash Partitioning, follow the below link,

1 comment:

  1. I feel SQL is the best way of providing solutions to some very complex database problems.

    SQL Server Load Rest API

    ReplyDelete