Hi there, today we’ll be talking about LEAD and LAG functions in SQL. For the purposes of this article, we’ll be using AdventureWorks2016 on Microsoft SQL server.
For starters, the LEAD and LAG functions were first introduced in SQL Server 2012. They are window functions.
The LEAD function is used to access data from SUBSEQUENT rows along with data from the current row.
The LAG function is used to access data from PREVIOUS rows along with data from the current row.
An ORDER BY clause is required when working with LEAD and LAG functions, but a PARTITION BY clause is optional.
Now, let’s look at some examples. Imagine a manager asks an analyst to run a query of all the order quantity values, along with another column for each preceding order quantity value. That query would look like this:
1 2 3 4 5 6
SELECT [OrderQty], LAG([OrderQty]) OVER(ORDER BY[OrderQty] DESC) AS[Lag "OrderQty" Column] FROM[Sales].[SalesOrderDetail] ORDER BY[OrderQty] DESC
In the example above, the first column in the SQL query is the order quantity column, the second column makes use of the LAG function, acting on the order quantity column. The OVER() clause is then applied (because LAG and LEAD are window functions), wherein the new column being formed - the
[Lag “OrderQty” Column] - is ordered by order quantity in descending order. The entire query is also ordered by the order quantity in descending order. The result of this query will appear like this:
In the image above, in the Lag “Order Qty” column, the first value is NULL. This is because the LAG function is supposed to return the last value from the current row. Because the first value in the OrderQty column is 44, there is no value for the LAG function to return, hence the NULL value in the first cell of the second column.
In the third row in the image above, the values for each of the columns are 40 and 41. The value 41 in the second column is a previous value of the first column, brought forward. This is how the LAG function works.
For the LEAD function, imagine a manager asks the data analyst to produce a query showing all the order quantity values, along with each following order quantity value. The query for that would look like this:
1 2 3 4 5 6
SELECT [OrderQty], LEAD([OrderQty]) OVER(ORDER BY[OrderQty] DESC)[Lead "OrderQty" Column] FROM[Sales].[SalesOrderDetail] ORDER BY[OrderQty] DESC
The results of the query are similar to the previous one, with the exception of the LEAD function being exchanged with the LAG function.
The result of the query above would appear like this:
In the image above, there’s no NULL value appearing in the results set, but there is a difference. In row one, the value 41, in the Lead “OrderQty” column is alongside the value 44 in the OrderQty column. The value 41 is brought forward by the LEAD function from the second row where 41 resides in the OrderQty column. This is how the LEAD function works.
It is also possible to LEAD or LAG by a specific number of rows. What this means is that, with the LEAD function, for example, I can specify to start bringing forward values starting from after the next N rows. Let’s look at another example.
1 2 3 4 5 6
SELECT [OrderQty], LEAD([OrderQty], 2) OVER(ORDER BY[OrderQty] DESC)[Lead "OrderQty" Column] FROM[Sales].[SalesOrderDetail] ORDER BY[OrderQty] DESC
In the query above, the LEAD function is being used to bring values forward, not from the current row, but starting from after the next two rows. The result of this query will be this:
In the image above, both rows one and two of the second column contain the value 40. This is because in column one (OrderQty) the third and fourth values are 40 and 40. This is how offsetting the values in the LAG and LEAD function work.