Why Programmers will love Common Table Expressions (CTE) in SQL
In this article I will give an introduction to what Common Table Expressions (CTE) are in the SQL Language. I will give some examples on how to use them, and thy they will help you in your SQL journey.
(You can also find this story at my web page eliasnordlinder.com, where I also post many other blog posts about Business Intelligence)
I have a previous Python Background before moving into SQL, and one Best Practice in Python is to conduct Modular Programming. I think that most people that have done some kind of programming know what Modular Programming is.
Modular Programming is a technique that splitting the program into several independent modules where ideally each module should have one function.
Modular systems are more reusable, more easily read and understood, and it is easier to change small part of it without destroying the whole. It also helps to stop repeating the code as several part of the code can reuse the same module.
You might now think:
How is this related to SQL, which is a high level query language
The answer to that questions is, you guessed it, CTE’s
What is Common Table Expressions and what is their history?
To understand Common Table Expressions, one must first explain their “predecessors”, sub-queries and derived tables. Most people that have worked with SQL to a certain extend have worked with these objects.
They are usually used to perform an action on an aggregate measure. One example of a derived table can look like below:
In this example we want to see which department who have a total sales that is greater than the average total sale per department. We have in total three derived tables. The first derived table is the one below:
We are just making a grouping of sales by department, this is to get the left side of our comparison later. The second derived table is as:
We are taking the average sales of the previous derived table, total sales. And yes, as you can see, we have the third derived table (Same as the first one), nested in this derived table:
Problems with derived tables
For me, there are several things that screams “Problem Problem” in this code. The first one is that I don’t think that the first code snipped is not very readable. This is not a complex query, but it is already causing problems.
If the query would be much longer it would cause much more problems.
The second aspect is that we are repeating code, using the same derived table two times. It feels intuitively wrong to repeat code, coming from a programming background, where repeating code is not a good practice.
The query is just getting longer and more complex.
Finally, what if I would like to change something in the derived tables. First, it would hard to find exactly where I should change it. Secondly, I would have to change it multiple times probably.
Solution to Problems -> CTE
Common Table Expressions
Common Table Expressions are a solution to all of these problems above, they are best used when running a sequence of complex expressions.
Below is the same problem as above remade to a CTE, side by side with the original expression.
Difference between CTE and derived Table
For me, the left version looks much cleaner and easier to read than the right version.
- One can easy distinguish the two CTE, tot_s and avg_s
- We only have two CTE, instead of three derived tables as in the right version. This is because we have repeated tot_s two times
- We can easy change things in the CTE above
Different aspects of an CTE
The first aspect of writing a CTE is to use the WITH clause. Yes, they don’t use a SELECT clause as most of the SQL queries usually use. After the WITH clause the name of the CTE is written, followed by an AS.
After this comes the parentheses, and therefore the expression inside the parentheses. As one can see above, the expression inside the parentheses is the same as for the derived table. Below is the steps.
One can also add several CTE after each other by just adding a comma in the end of the first CTE. The second CTE can then start with just the name, skipping the with clause.
The WITH clause is a way to implement the modular programming best practices into SQL. It is an alternative to sub-queries/derived tables and divides the query in several part that all have one purpose.
The result is that the code is easier to read, easier to change, and less repetitive.