Wednesday, December 25, 2013

Joins in SQL Server

Joins are very useful in situations where you need to get data from different tables. Joins are written based on columns from both the tables. A column from each table is taken for writing the join.

The following are the kinds of joins used in SQL Server

                                   1. Inner Join
                                   2. Outer Join
                                   3. Cross Join
                                   4. Self Join
                                   5. Comma Join

Let's go inside each join with a practical explanation.

Inner Join


Inner Join is a join which returns the matched rows from both the tables. The matching is done based on a column in each table. It ignores NULL values. Though there are NULL values in both the tables, it doesn't match them.

Let's see this in practical:

To do research on different features, Microsoft provides a sample database with tables and data inserted into it. namely AdventureWorks. Let's do our Join adventures on that database.

USE AdventureWorks2012
GO
SELECT * FROM Person.Person
GO
SELECT * FROM Person.PersonPhone

There are two tables like I specified above. If I run the above code, I get results as:


You can see that there is a common column among both the tables by name BusinessEntityID.Let's join both the tables based on this column.

SELECT p.BusinessEntityID,p.PersonType,p.FirstName,p1.PhoneNumber,p1.ModifiedDate
FROM Person.Person p
INNER JOIN Person.PersonPhone p1
ON p.BusinessEntityID = p1.BusinessEntityID 
WHERE p.BusinessEntityID < 10

I need BusinessEntityID, PersonType, FirstName columns from table Person.Person and PhoneNumber, ModifiedDate from table Person.PersonPhone. I've chosen to join those table based on BusinessEntityID column to get my desired result. Note that in the key word 'INNER JOIN', the word INNER is optional. If you don't specify it, Query Optimizer automatically performs INNER JOIN. But as a standard and to make your fellow users understand whay kind of join you performed, specify INNER keyword.

Now see the Result Pane after executing the above query:



You can see in the result pane that the result is acquired by joining the specified columns from both the tables like a single table. 

As I said above, Inner Join ignores NULL values though they exist in both the tables. Let's see a small case of this:

I updated a column value to NULL in both the tables.

UPDATE Person.Person SET BusinessEntityID=NULL WHERE BusinessEntityID=9

UPDATE Person.PersonPhone SET BusinessEntityID=NULL WHERE BusinessEntityID=9

Run the following query:

SELECT p.BusinessEntityID,p.PersonType,p.FirstName,p1.PhoneNumber,p1.ModifiedDate
FROM Person.Person p
INNER JOIN Person.PersonPhone p1
ON p.BusinessEntityID = p1.BusinessEntityID 
WHERE p.BusinessEntityID < 10


See the Result Pane, though the values of column BusinessEntityID in both the tables have same column value NULL, Inner Join didn't display that record. This proves that Inner Join ignores even the matching NULL values.

Outer Join

Outer Joins are of three kinds,

                               1. Left Outer Join.
                               2. Right Outer Join.
                               3. Full Outer Join

Left Outer Join


It joins two tables based on a column but it doesn't work on matching principle like Inner Join. It returns all the rows present in Left Table. If any matching is found between Left and Right Tables it returns those rows. If no matching is found, it returns NULL values.

Let's have a practical:

Run the following query:

USE AdventureWorks2012
GO
SELECT P.Name,P.ProductNumber,pr.ReviewerName,pr.ReviewDate FROM Production.Product p
LEFT OUTER JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductReviewID

Here the Left Table is Production.Product.

Now see the Result Pane


We tried to match based on ProductID column in Production.Product table and ProductReviewID in Production.ProductReview table. There are only four matching records in both the tables. So those four matching records are returned. You can also see that after those four records, all the NULL values are present. This means there are no matching records based on the matched columns. All the records of left table (Product.Product) are displayed.

Right Outer Join


Right Outer Join also follows the same principle as Left Outer Join. The only difference is. if you use Left Outer Join, all the records of Left Table are displayed whereas if you use Right Outer Join, all the records of Right Table are used.

If you want to perform Right Outer Join on the above query, the query is as follows:

USE AdventureWorks2012
GO
SELECT P.Name,P.ProductNumber,pr.ReviewerName,pr.ReviewDate FROM Production.Product p
RIGHT OUTER JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductReviewID

Full Outer Join


Unlike Left Outer and Right Outer Joins, Full Outer Join returns all the rows from all the tables that are joined.

Suppose there are 500 rows in Table1 and 400 rows in Table2. If a Full Outer Join is performed among these tables then you will get 500 rows from table1 and 400 rows from table2. In the Result Pane, you can see that the last 100 rows in table2 are NULL values. In total you get 500 rows in Result.

Use the above query:

USE AdventureWorks2012
GO
SELECT P.Name,P.ProductNumber,pr.ReviewerName,pr.ReviewDate FROM Production.Product p
RIGHT OUTER JOIN Production.ProductReview pr
ON p.ProductID = pr.ProductReviewID

Now see the Result Pane:



Here after 4th row all are NULL values. This is because in the table Production.ProductReview, we have only 4 rows. So we got only 4 rows and remaining all NULLs.

Let's check whether I said right or wrong. Run the following queries as a single batch:

SELECT COUNT(*) AS PRODUCT FROM Production.Product

SELECT COUNT(*) AS PRODUCTREVIEW FROM Production.ProductReview


So, we got 504 records by running the JOIN query.

Cross Join


Cross Join is like a Cartesian Product. It can also be considered as One-To-Many Relationship. Let's have a practical to understand this.

In AdventureWorks2012, we have two tables by names Sales.SalesPerson and Sales.SalesTerritory.

Let's see how many records exist in both the tables by running the following queries:

SELECT COUNT(*) AS SALESPERSON FROM Sales.SalesPerson

SELECT COUNT(*) AS SALESTERRITORY FROM Sales.SalesTerritory


There are 17 and 10 records respectively in both the tables. Now let's perform a cross join among these tables.

SELECT p.BusinessEntityID, t.Name AS Territory
FROM Sales.SalesPerson p
CROSS JOIN Sales.SalesTerritory t

See the Result Pane


Here, you can see that the column BusinessEntityID column values are 274,275....290 are with Territory column value 'Australia'. Later they are with Territory column value 'Canada'. It means a kind of one-to-many relationship is there between Australia and 274,275...290. If you check the number of records obtained in this join, it will be 170 which is the product of number f records from Sales.SalesPerson and Sales.SalesTerritory tables (i.e. 17x10 = 170).

In the above Cross Join query, we didn't use a WHERE clause. If we use it, Cross Join behaves as Inner Join.

Run the following query:

SELECT p.BusinessEntityID, t.Name AS Territory

FROM Sales.SalesPerson p
CROSS JOIN Sales.SalesTerritory t
WHERE p.TerritoryID = t.TerritoryID

See the Result Pane now:


Now let's perform a Inner Join just by replacing the keyword CROSS with INNER.

SELECT p.BusinessEntityID, t.Name AS Territory
FROM Sales.SalesPerson p
INNER JOIN Sales.SalesTerritory t
ON p.TerritoryID = t.TerritoryID

See the Result Pane

                                    

You can see that, Cross Join and Inner Join produced the same result if we use a WHERE clause in Cross Join.

Self Join


If a table is joined to itself then it is called Self Join. This is used if some records of a table should be joined with some other records in the same table. Here we need to use two different aliases for same table in the same query to help Query Processor know which column values should be displayed from which part of the table.

Let's see a practical on this:

SELECT st.BusinessEntityID,sp.SalesQuota, sp.SalesYTD
FROM Sales.SalesPerson AS sp
JOIN Sales.SalesPerson AS st
ON sp.TerritoryID = st.TerritoryID

Here we need to get records matching on the column TerritoryID within the table. You can also consider Self Join as Inner Join within the table. It also ignores NULL values. That's why I said in Inner Join part to specify INNER keyword.

See the Result Pane now:


Comma Join


One more join in SQL Server is Comma Join. It doesn't follow the principles of above joins. It just needs tables to join and get the result. It doesn't even require columns for matching. Just a Comma (,) is needed. Here's an example of Comma Join:

SELECT * FROM table1,table2

Now see the Result Pane:


The columns firstname and lastname belong to table1 while the remaining belong to table2. Result is acquired just by joining those tables but not matching any column. That's about Comma Join.


I just tried to explain my best about joins with practical examples. Please follow the official documentation of Microsoft for the fundamentals of Joins.

1 comment: