Kieran Senior

UNION ALL Versus Sub-Queried JOIN In MS SQL 2008

A recent to shift from MS SQL 2000 to MS SQL 2008 was required, and assuming everything would go swimmingly, with greater speed increases, and improved development with the integration of Intellisense to the SQL management tools I thought all would be well. That is until users starting hitting a page which ran a stored procedure. Upon further investigation it was found that the queries within the stored procedure were causing the server to max out for at least five minutes at a time, hindering from all usage of any web pages. Odd, seeing as these stored procedures were run on a daily basis on the old server, and a little fore-sight was taken into account for the compile-time of each stored procedure as they were migrated over to the new server. Unfortunately this wasn’t the case. Whatever happened in the development of SQL in this eight-year period changed the way queries work. Consider an example where a join encompasses a nested select, cumbersome sure, but capable, even with a table composed of over half a million entries. Unfortunately this will take a long, long time. Once run once the query will run just time again, perhaps because the data is placed higher in the SQL stack.

Moving forwards, this was tested over and over again only to find the same results. Using CTE’s, views, or temporary tables had no effect on the query either. Colleagues then attempted alternative approaches to the same query where eventually we reached the use of union. To our surprise the query worked perfectly, and executed in quite literally 5 seconds, compared to the 5 minutes in previous attempts.

SELECT
    Column1,
    Column2
FROM
    MyTable
WHERE
    Column2 LIKE '%exp%' OR
    Column1 IN (SELECT Column1 FROM MyOtherTable)

Compared with:

SELECT
    Column1,
    Column2
FROM
    MyTable
WHERE
    Column2 LIKE '%exp%'
UNION ALL
SELECT
    Column1,
    Column2
FROM
    MyTable
WHERE
    Column1 IN (SELECT Column1 FROM MyOtherTable)

The latter example will run much, much faster. Logically they are the same, and produce the same results, but in fact the difference in performance is substantial. The only problem is that the better performing query is a much disliked one also.

blog comments powered by Disqus