Sunday, December 1, 2013

NOT IN vs NULL in SQL Server

Recently, I was given a small challenge regarding NOT IN and NULL operators in SQL Server. I was able to crack it after doing some research online. I'd like to share the challenge and its solutions.

I was given two tables with their definitions and INSERT statements for it. It went as follows:


CREATE TABLE table1
(
      firstname VARCHAR(50),
      lastname VARCHAR(50)
);

CREATE TABLE table2
(
      address1 VARCHAR(50),
      address2 VARCHAR(50),
      zip VARCHAR(50),
      name VARCHAR(50)
);

INSERT INTO table1
VALUES ( 'f1', 'l1' ),
              ( 'f2', 'l2' ),
              ( 'f3', 'l3' ),
              ( 'f4', 'l4' ),
              ( 'f5', 'l5' ),
              ( 'f6', 'l6' ),
              ( 'f7', 'l7' );

INSERT INTO table2
VALUES ( 'add1', 'add2', '12300-12', 'f2' ),
              ( 'add3', 'add4', '12300-14', 'f4' ),
              ( 'add5', 'add6', '12300-20', 'f1' ),
              ( 'add7', 'add8', '12330', NULL );

Now I was asked to execute the following query after creating the tables and inserting values into them.

SELECT t1.firstname, t1.lastname
FROM table1 AS t1
WHERE t1.firstname NOT IN ( SELECT t2.name FROM table2 AS t2 );

It returned zero records.



I was asked to explain why zero records are returned and what can be the remedy for this. Then I went reading about NOT IN because on seeing the query I doubted there is something wrong with it. I told a solution which I got.

Remove NOT in the query and then execute, the query goes as follows:

SELECT t1.firstname, t1.lastname
FROM table1 AS t1
WHERE t1.firstname  IN ( SELECT t2.name FROM table2 AS t2 );



Now I don't get zero records. This is because the operator IN performs a kind of MATCHING operation. It is used mainly when sub-querying is used. There are two queries in the above SQL statement. The first statement "SELECT t1.firstname, t1.lastname
FROM table1 AS t1 WHERE t1.firstname" gets all the records of columns firstname and lastname from table1 and the second query ( SELECT t2.name FROM table2 AS t2 ) gets all the records of column name from table2. Now the IN operator residing between those two queries performs a matching operation between the values returned by them and returns the records that are matched. It means the values which are common in in the columns (firstname column in table1 and name column in table2) are returned.

But the person who gave me this challenge was not satisfied. He wanted the result using NOT IN the query. Then I read about the purpose of NOT IN, analyzed why zero records are returned and found one more solution.

The reason for getting zero records is, there is a NULL value in the column "name" of "table2". NOT IN is the negation of IN operator. If IN returns the values that are matched then NOT IN returns the values that are not matched. It means we should the result with records that are not matching in both the tables. Also, I found that NOT IN operator performs a kind of AND operation internally. As per basics of statements, AND returns true if both the input values are true otherwise false. It requires the input a TRUE or a FALSE. But here AND operation fails due to NULL value because NULL is neither TRUE nor FALSE and violates the condition of AND. So NOT IN fails and zero records are returned.

Now the question is "Can we get some records using this NOT IN operator and keeping the NULL value as it is?"

Yes, we can get the records even by having NOT IN in the query. It is possible by using an expression COALESCE. By using this expression, the given query goes as follows:

SELECT t1.firstname, t1.lastname
FROM table1 AS t1
WHERE t1.firstname NOT IN ( SELECT COALESCE(t2.name,' ') FROM table2 AS t2 );

Run this query and you can find some records in the Result Pane. 


Let me tell the purpose of COALESCE expression. It performs a kind of OR operation. As per basics of statements, OR returns TRUE values unless both the inputs are FALSE. Here you can see that I changed the query. After "t2.name" I specified a ' ' which is the indication of empty string in SQL. This was done to effect the COALESCE expression.

The basic function of COALESCE as per Microsoft's Official Document, 

"Evaluates the arguments in order and returns the current value of the first expression that initially does not evaluate to NULL".

It means it ignores the NULL values and returns the values that are NOT NULL. Here it ignores the NULL value of the column "name" of table2 and returns the values that are not equal to NULL. I also found that COLAESCE is equivalent to LEFT OUTER JOIN. It combines all the values of the expression including NULL into a single value and returns the first non-NULL values.

Now the NOT IN operator compares the values returned by both the queries of that SQL statement and returns the values that are not matched.

Finally, to conquer its enemy NULL, NOT IN needed a friend COLAESCE.

No comments:

Post a Comment