Sunday, June 29, 2014

ACID Properties in RDBMS

If there exists something with name Database then it should some properties. First was Codd's Rules. Then it should be in such a way that it undergoes Normalization. But these two come under creation segment of database mainly. After database is designed, data modifications are done obviously. Such data modifications in RDBMS are done through Transactions. Every update, insert etc are considered as a transaction. These transactions ensure the consistency of your database. Every transaction should adhere to some properties called ACID properties. ACID is an acronym for Atomicity, Consistency, Isolation and Durability. In this post, we'll see about ACID.

Atomicity


A Transaction should be atomic in nature. This means, multiple changes to the database are executed as a single unit by transaction. If a transaction executes 'n' number of changes to database then all should be committed if everything goes right, otherwise no single change should be committed if anything goes wrong in the transaction. If one change is committed and remaining are not committed then such transaction is not said to be atomic because this results in inconsistent state of database. The ability of committing or rolling back a transaction is achieved by Atomicity.

Consistency


Every transaction should leave the database in a consistent state after its completion (A transaction is said to be completed if it is committed or rolled back). The best scenario to explain this property is given in Microsoft SQL Server 2012. I'm reproducing the same scenario here.

Assume there are two tables Table1 and Table2. If Column1 of Table1 is used as foreign key in Table2 then there shouldn't be a scenario where a transaction cannot update column1 in Table1 with a value but it updates the same column in Table2 with the same value. This results in inconsistency of database as there is a violation of Foreign Key.

When a transaction starts, it can take the database into inconsistent state but at its completion it should bring database back to consistent state.

Isolation


The word ISOLATION itself states that a transaction should be independent from other transactions. One transaction should not affect another transaction. RDBMS like SQL Server has different isolation levels like READ COMMITTED,READ UNCOMMITTED, SERIALIZABLE etc. By default database resides in level READ COMMITTED. This means if a transaction is making changes to data then another transaction cannot even read that data until the previous transaction gets completed. Transactions impose locks at table level, row level to ensure that no other transaction can red or modify the data that is being changed by them. Other RDBMS like Oracle, MySQL have locking mechanism imposed on tables.

Durability


Durability is a property which ensures the changes made by a committed transaction reside stable in the database. This can be achieved by writing the changes to a stable disk. For this, there exists a Transaction Log which records every transaction made in the database. Changes made are recorded in Transaction Log before writing them to the disk.

Sunday, June 22, 2014

Database Normalization

Next to Codd's Rules, this is the concept a DBA should be aware. A database is normalized to avoid redundancy of data. For this, large tables are broken into small tables and relationships are defined among them. This avoid anomalies when making additions, deletions and updates to the data.

Edgar F. Codd defined normalization forms in database which was later extended by Raymond F. Boyce. Codd specified three normal forms namely First Normal Form (1NF), Second Normal Form (2NF) and Third Normal Form (3NF). Later Codd and Boyce specified Boyce-Codd Normal Form which is the next level of 3NF.

There are several normal forms defined,

    1) First Normal Form (1NF)
    2) Second Normal Form (2NF)
    3) Third Normal Form (3NF)
    4) Fourth Normal Form (4NF)
    5) Fifth Normal Form (5NF)
    6) Sixth Normal Form (6NF)
    7) Boyce-Codd Normal Form (BCNF)
    8) Elementary Key Normal Form (EKNF)
    9) Domain/Key Normal Form (DKNF)

Among the above normal forms, 1NF, 2NF, 3NF and BCNF are the most important. Most of the organizations normalize their database up to BCNF. The remaining are the higher levels of normalization. If you reach 3NF then your database is said to be normalized. In this post, we will discuss about 1NF, 2NF, 3NF and BCNF.

Before that let's have a glance at the technical terms used in Normalization concept,

Tuple - A tuple is a row or a record of the table.

Candidate Key - A candidate key is a primary key defined on single column of the table.

Super Key - A composite primary key is called a Super Key.

First Normal Form (1NF)


According to Codd, a table is said to be in 1NF if each attribute contains atomic values. This means each column of the table should contain only one value. Data in the table should be organized into rows. When data is fetched from the table, there shouldn't be multiple rows displaying same data from the set of columns. There should exist a Primary Key which distinguishes rows as unique among them. This Primary Key column shouldn't have duplicate values.

Consider a scenario where there is a table that shows the information about film actors acted in different films as below,

Actor Code
Name
Film
JD
Johnny Depp
Pirates of Carribbean, Blow
GC
George Clooney
Gravity, Oceans Eleven

The above table shows the information about actors acted in various films. It violates 1NF as it contains multiple values in a row for single column. If that has to be organized then it should be as follows,

Actor Code
Actor
Film
JD
Johnny Depp
Pirates of Carribbean
JD
Johnny Depp
Blow
GC
George Clooney
Gravity
GC
George Clooney
Oceans Eleven

Now the table is organized in such a way that in each row there is only one value for a column. But this still violates 1NF in the aspect of Primary Key. In the above table, the column Actor Code can be considered as Primary Key as it remains unique though there exists another actor with the same name, he will be given a different code. 1NF states that column defined as Primary Key has unique columns. To have that uniqueness, the above table can be broken into two tables, one being Actor and the other being Films as below,

                               Actor
Actor Code (PK)
Name
 JD
Johnny Depp
GC
George Clooney







                           Films
Actor Code (FK)
Film
JD
Pirates of Caribbean
JD
Blow
GC
Gravity
GC
Oceans Eleven










Now there are two tables namely Actor and Films which are in 1NF. In the table Actor, Actor Code is the primary key which is a Foreign Key in the table Films.

Second Normal Form (2NF)


According to Codd, a table is in 2NF if it is already in 1NF and no non-prime attribute is dependent on any proper subset of the candidate key. In database, multiple columns can be concatenated as a single primary key. Codd's statement means that every column other than the columns defined in primary key must be dependent on all the columns of primary key.

If there is a single Primary Key then there is no violation of 2NF. If there is a composite primary key then the non-prime attributes should be in such a way that they should depend on the entire composite primary key. i.e., on all the columns used in the primary key.

Consider the following table,

Actor Code (PK)
Film (PK)
Actor
Genre
JD
Blow
Johnny Depp
Crime
GC
Gravity
George Clooney
Sci-Fi
BP
Fight Club
Brad Pitt
Drama

Here there is a composite primary key containing Actor Code and Film columns. But the remaining columns Actor and Genre do not depend on the entire primary key. Actor depends only on Actor Code whereas Genre depends only on Film. This states that there is a partial dependency of non-prime attributes on the primary key which violates 2NF. To have this table in 2NF, it can be split into two with names Actor as one table and Film as another table between whom there can be a relation.

              Actor                                         Films

Actor Code (PK)
Actor
JD
Johnny Depp
GC
George Clooney
BP
Brad Pitt
Film (PK)
Genre
Blow
Crime
Gravity
Sci-Fi
Fight Club
Drama


Now both tables are in 2NF. A relation can be established between them by either taking Film as Foreign Key in Actor table or Actor Code as Foreign key in table Film.

Third Normal Form (3NF)


This is the next level of 2NF. Codd stated that, a table is said to be in 3NF when it is already in 2NF and there exists no transitive dependency among the attributes of table. This means there is no dependency among columns of table such that Column2 is dependent on Column1 and Column3 is dependent on Column2. So, here Column3 is dependent on Column1 through Column2.

Consider the following table which shows the information about Academy Award winners for different years,

Category (PK)
Year (PK)
Winner
Date of Birth
Best Picture
1992
Clint Eastwood
May 31, 1930
Best Actor
1993
Tom Hanks
July 9, 1956
Best Director
1993
Steven Spielberg
December 18, 1946

Assume there is a composite primary key combining Category and Year columns. There exists a Transitive Dependency violating 3NF. The Column Winner is dependent on primary key columns whereas Date of Birth is dependent on Winner. So, Date of Birth is dependent on primary key columns through the column Winner where violation occurs. To have the table in 3NF, it can be split into two as Dates of Birth and Oscar Winners. There can be a foreign key relationship between the two tables as below,

      Dates of Birth
Winner (PK)
Date of Birth
Clint Eastwood
May 31, 1930
Tom Hanks
July 9, 1956
Steven Spielberg
December 18, 1946

                                        Oscar Winners
Category (PK)
Year (PK)
Winner (FK)
  Best Picture
1992
Clint Eastwood
Best Actor
1993
Tom Hanks
Best Director
1993
Steven Spielberg

Now transitive dependency is removed and the tables are in 3NF.

Boyce - Codd Normal Form (BCNF)


When your database tables reach 3NF it means that your database is normalized. BCNF is a higher level than 3NF which reduces more anomalies in your database tables.

According to Boyce and Codd, a table is said to be in BCNF if it is already in 3NF and there is no overlapping of multiple candidate keys. 

Consider the case of a person's information. A person can be identified by his Driving License or by his Passport or PAN. BCNF states that every attribute should describe every part of the Super key as the above three keys are combined as Super Key. If an attribute describes one of the candidate keys but not another candidate key then it violates BCNF. This means, if a person has Driving License, Passport and PAN then entity is in BCNF. If he doesn't have any of the above then the entity is not in BCNF.

This is all about important normal forms to be aware of. If you want to acquire knowledge on other specified normal forms, follow the following link,

http://en.wikipedia.org/wiki/Database_normalization

Sunday, June 15, 2014

How to mess up your MySQL Server in Windows OS?

Messing up things is not new to us humans. In this post, I'll tell you how to mess up your MySQL Server if you're frustrated with it. I'll also tell you how to recover your messed up MySQL instance when you realize your mistake.

Before doing that, we need to know the table storage architecture of MySQL. It has a Files per Tablespace architecture. It means MySQL keeps two files per table on the disk. These files have the same name as its respective table but different extensions (formats). One file is of format .frm and the other is of .ibd. The file with former format contains the schema information of the table like columns, constraints, primary key and foreign keys defined in the table whereas the latter contains the data, indexes of the table. Before MySQL 5.6.7, all the tables used to reside in System Tablespace. This contains metadata of Storage Engine and can be usually found in the following path,

C:\Program Files\MySQL\data\mysql

C:\Program Files\MySQL\data\performance_schema

As the System Table space gets increased in size with the increasing data in the server, from version 5.6.7, InnoDB tables are stored in different filespace. If you install MySQL in your C:\ drive, it will be found in the following path,

C:\ProgramData\MySQL\data

The folder "ProgramData" is hidden by default. In this location, you can find your databases and the tables in the databases.

Take a backup of your whole instance. If you're using MySQL Enterprise then Enterprise Backup can be taken as shown here. If you're using Community Edition then backup can be taken as shown here.

Now let's see how to mess up MySQL Server. Go to the above data directory and open one database folder. 

If you delete the .frm and .ibd files of any table then you can't query that table from your MySQL Editor. It says the table doesn't exist. 

There is another major messing technique of MySQL. If you open any .frm file in NOTEPAD/WORDPAD, you can find the column names of your table. Though it is not clear, still you can find your schema roughly. If you open .ibd file, you can find the table data in the same rough manner. 
If you try to edit any .ibd file then it says "Another process is accessing it" . This process is your MySQL Instance. So, type "services.msc" in Run and stop MySQL Service. Now come back to your .ibd file and edit any word you can read clearly (That word is your table data). Save the .ibd file. Now go back to Services and start MySQL Service. It displays the following dialog box,


  

This means the whole MySQL instance is corrupted. To get your instance back, do the following actions:
  • Uninstall the existing MySQL Instance and delete all its corresponding folders from your system.
  • Install an instance again.

Recovering an Enterprise Backup


In MySQL Workbench, go to the option Backup Recovery under MySQL Enterprise section, which directs you to Backup - Recovery Wizard.




If you've taken any backup before in the same directory where you've configured backup, you can select that backup profile and click next

If you have the backup folder anywhere you can select the option Restore From Folder below. It allows you to select that folder. But this option works only when there exists a folder meta in which a file with name backup_variables.txt. This file contains the starting Log Sequence Number (LSN) and ending LSN which is important in recovering a database. If this file doesn't exist then this option doesn't work. This file is generated by InnoDB automatically while generating backup.

Suppose you have a backup created in the same directory you configured backup, you can get the field backup profile got filled automatically,



Now click next to restore the selected backup profile,


The list of tables in database(s) are shown here. If you check Show System Schemas then it shows the system databases which will be recovered. If there is no display of the system shcema then also they will be recovered. This is because the InnoDB system tablespace holds metadata about InnoDB tables from all databases in an instance. This information regarding the restoring databases is also restored to the instance.

Now click Next. This interface shows the steps to be performed for restoring operation. Click on Restore. It shows a warning that MySQL will be restored to the position when backup was made. This means if there exist the database(s) with the same name as the database(s) getting restored now, they will be replaced with the backed up objects. This may result in the loss of newly made changes to the database.


Click on Restore now which starts the restore operation. You can also see what's going on by clicking on Show Logs button,


One of the steps is Shut Down MySQL Server. This means, the backup is restored after shutting down the server. After shutting down, all the backup is copied to the data directory. This is one of the disadvantages with MySQL. Suppose you have not messed up the whole instance and there comes a situation to restore a particular database. If you have multiple databases in your instance then all the users who are working on other databases lose connection to the instance as it is shut down. It won't be restarted until the restoration of backup is completed. It's better to use this option when you have to restore the whole instance.

Click on Done after the whole operation which brings your instance back.

Recovering through Data Import


I've specified a disadvantage with MySQL while restoring database. To avoid that, restoring through Data Import is useful. Take a backup before by using Data Export. Now use the option Data Import which doesn't affect other databases. This is the better option to use when you have to restore only a single database.

The procedure of Data Import is shown in one of my previous posts,


How to get back an instance when no backup was taken before?


You may have a situation where you don't have any backup taken before but you've corrupted the instance.

There is no solution for this problem other than taking a backup before. After corrupting the instance, if you copy all the database folders to another location and then you copy all of them into the newly installed data directory then those databases are shown in the object explorer but you cannot get any data from them.

The reason for this is, there exists a difference in metadata of the newly created instance's databases InnoDb tables and the older database's InnoDb tables. So this trick will not work.

That's why BACKUP keyword plays a vital role in RBDMS. If you mess up your server without a backup then you have to start your database design from the foundation.

MySQL Workbench Enterprise Backup

MySQL Enterprise Backup is a package that comes with the Enterprise Edition, used for providing HOT BACKUPS. 

Hot Backup is a backup which is taken while database is online. If any changes are made to the database while backup operation is in progress, all those changes are taken into the backup. If the transaction is rolled back then those changes are ignored. This Hot Backup operation occurs for InnoDB Tables. For the tables that use other storage engines like MyISAM, warm backups are taken where database lies online but no change can be made to any database object while backup operation is in progress.

The MySQL Enterprise Backup offers different kinds of backups like Full and Incremental Backups. The whole instance can be taken a backup or a single database can be taken a backup. These backup operations can be scheduled. Every backup operation is considered as a Job.

This MySQL Enterprise Backup comes as a package which can be downloaded from here. When it is installed on your computer, it creates an executable file mysqlbackup.exe which is used to backup your database. After installing it, settings should be configured in Workbench. After installing, clicking on Online Backup takes you to the settings page.




In the above screenshot, there are three prerequisites which have to be configured.
  • The first setting is providing the path to MEB Executable. You need to provide the path where the executable file mysqlbackup.exe exists. This lies in the directory where you installed the MySQL Enterprise Backup package.
  • Second setting is creating a directory to store the generated backups. Provide any location on your computer. Give the folder name as you wish. If that folder doesn't exist in the specified location then you can click on Create Directory button which creates a folder in that location.
  • In MySQL, different operations are carried out by different accounts which have permissions to perform specific operations. For backup, there is a dedicated account called mysqlbackup. While configuring this for the first time, you can create this account with a password.


After configuring all the options, click OK which lets you to go to Backup Wizard.

  • Click on New Job.
  • You can give a name to this job and can add comments to it.
  • Down to it, there are three tabs Schedule, Contents and Options.
  • In Schedule tab, you can schedule when a full backup should occur and when an Incremental backup should occur.
  • In Contents tab, you can select either Full Instance for backup or a specific database for a backup. There comes a warning when you select partial database backup. If I write about it here, this post will get lengthier. For more information about Partial Backup Warning, click here.
  • In Options tab, you have options of changing the backup directory, Compressing Backup (which comes only for non-incremental backups) and Applying log after backup.

I'll post about how to take backup and recover the backup in my future posts.