In this article I will explain what Window functions are.
I will first write a short introduction of Window Function, when they were introduced into SQL Server and their overall use cases. After this I will go into a sample of real life scenarios when they are really helpful.
(You can also find this story at my web page eliasnordlinder.com, where I also post many other articles about Business Intelligence)
Window Functions, Background and Timeline
Windows functions were introduced in in SQL Server in 2005. In the beginning they could be aggregated, but no Window frame could be specified.
In 2012 a big step were made, where Window framed were introduced, Offset Window functions (LAST_VALUE, FIRST_VALUE, LEAD and LAG) were added as well as statistical functions (PERCENT_RANK, PERCENTILE_CONT etc.)
From 2012 a lot of changes have been done to add performance while using Window functions, in the future there will probably be even more, as their use case in the analytical spectrum feels very important.
Why Window Functions?
Window functions allow you to perform calculation against sets or rows (Windows), in a very flexible, clear and efficient way.
Window functions uses the OVER clause to be applied to a set of rows and their main purpose is to be used in analytical purposes. Example of use cases is in running totals, moving averages, identify gaps or handle intervals. Below is an Example:
This will take the store_id and customer_id, then create a ranking of this data based on the customer_id as a column rnk. Finally we sort it by the new column rnk to get the final table, as below.
In the OVER clause the exact number of rows that the current rows relates to is specified as well as how it is ordered. Another way to say it is that the Window that the function is defined over is specified by the OVER clause. In this problem, the Window is the Customer_id, and it is therefore the rank is done on this column.
Types of Window Functions
There are several type of Window functions, such as aggregate, ranking, distribution, and offset functions.
- Aggregate functions are functions such as SUM, COUNT, MIN and MAX
- Ranking functions are functions such as RANK, DENSE_RANK, ROW_NUMBER and NTILE
- Distribution functions are functions such as PERCENTILE_CONT, PERCENTILE_DISC, PERCENT_RANK and CUME_DIST
- Finally, Offset functions are functions such as LAG, LEAD, FIRST_VALUE, LAST_VALUE and NTH_VALUE.
Use cases of Window Functions
Some example where Window functions can be used is in cases as:
- De-duplicate data
- Return top n rows per group
- Compute running totals
- Perform operations on intervals
- Identify gaps
- Compute percentiles
- Sorting hierarchies
Building blocks of Window Functions
To understand Window functions on a more fundamental level, it is important to understand what the Window function is built on. A Window function consists of three main building blocks, Partitioning, Ordering and Framing. I will go through these three in detail below.
Partitioning is included in the WINDOW function by PARTITION BY. The role of the partition part is to restrict the calculation window to only values where the values are the same in partitioning columns as in the current row, an example is below.
The first Window function is without the PARTITION BY, while the second one include it for the store_id. What will happen is that the customer_id will be ranked as before, but they will be ranked for each store_id individually. Look at the result below and see the difference between rnk_all and rnk.
In rnk_all which is not partitioned by anything, the rank is only by customer_id and is therefore identically. In rnk on the other way, which is partitioned by store_id, the rank is different than from customer_id.
If you look at the picture further up, the customer with customer_id 4,6, 8 etc. have a store_id of 2. These will be ranked further down when the first “Partition Window”, which is store_id 1 is done.
Ordering is as it sounds like, the order that the calculation that is done, in the specific partition specified. In the examples above, we had Ascending order after customer_id in our rank functions.
The ordering in aggregate functions is a bit different, instead of working like above, it gives meaning to the framing options. It helps to define restriction of rows in the Window.
Framing is a second way of filtering in the Window Function, restricting the number of rows that are included in the Partition. One can think of it as two delimiters in which the row partition will be between these two delimiters. This can be done with ROWS, GROUPS or RANGE.
With the help of Framing one can do running total calculations for example, by select the frame as from the first row to the current row, and then run a Window summarization function.
Use cases of Window Functions as an BI Analyst
Below are some use cases for Windows functions.
You can also find good youtube videos explaining most of them at this Youtube channel, BeardedDev.
- Running totals/Running Averages
- Rankings with GROUP BY
- Running Differences
1. Running Totals/Running Averages
This is probably the most useful Window Function of all of them.
When to use?
- When you want to calculate how much sales a company have up to a specific date.
Example of the code for the running total and running average below:
The Framing is between the first row and the current row and the window is the Order_Date. The Partition is by Company_ID which means that the running total will be calculated for one company at the time. The first column will be the running total and the second the running average.
Another common way to use Window functions is with ranking.
Example code for Ranking below:
ROW_NUMBER, RANK, DENSE_RANK and NTILE is the four different types of ranking available in Window Functions.
- ROW_NUMBER -> Will just give a numbered list sorted descending from 1-n
- RANK -> Will rank according to order and partition, will skip if equal value
- DENSE_RANK -> Will rank according to order and partition, will not skip if equal value
- NTILE -> Will divide into different groups, if NTILE(5) will the whole dataset be divided in 5 groups depending on the ranking.
3. Rankings with GROUP BY
Another Scenario would be if we would like to rank our customers by their total sales. The code example is as below:
We will first run a normal SUM(Sales) as the total sales. After that we run the normal rank function and by the end we group by the Company_ID. We will get a rank from the highest total sales to the lowest by customer.
4. Running Differences
The final scenario that is interesting to look at is to calculate Running Differences. This can be done with the help of the LAG Window Function as below:
The LAG value is calculated by taking the previous value for every specific customer for every date. By using the calculation of the current value and subtracting the Lagged value we can calculate a running difference. This could be helpful to see which products or customers that sell most a specific day.
Window functions are an important tool in the SQL Toolbox and can help to write effective and quick queries. There are several types of Window Functions such as Aggregated, Ranking, Distribution and Offset Window Functions.
There are three main building blocks that Windows functions are built on, Partitioning, Ordering and Framing which each are important to get the right output of the functions.
Some main business scenarios where Window functions can be used is when calculating Running Totals/Averages, Rank functions, Ranking with Group By or calculating Running Differences.
BeardedDev Youtube Page: https://www.youtube.com/channel/UCWczzX6DyFV2KOYP_-CEZUA