Analytic Function — Lag

Starting with SQL Server 2012, Microsoft has included analytic functions within the database.  A couple that I use often, Lag and Lead, are very handy when you need to reference a previous or next row in a table without having to use self-joins or having to create a complicated sql statement.

Today I will show you how to use the LAG statement, as I had to use it recently to create a query to compare run times of ETL processes to see how they are performing.

We use custom auditing to track when an ETL package was started, ended and how long it took to run.  New versions of SSIS have built in tracking, but we are currently not using it, so this is what we have.

In our ETL_Stats table below you can see the package name, the start time, the end time and how long it took to run.

e1

 

What we need to do is compare by package_name, the current days duration compared to the previous days duration.  The syntax for Lag may look confusing below, but after you write a few statements, you will get the hang of it.  Here is a link to Books Online about the Lag function:

https://msdn.microsoft.com/en-us/library/hh231256.aspx

For our example below:

select package_name,starttime,duration,
lag(duration,1,null) over (partition by package_name order by package_name,starttime) as Prev_Days_Run_Time
from [dbo].Warehouse_ETL_Stats   where starttime>=getdate()-4
order by 1 ,2 desc

The option we used for the Lag function is to go back 1 row so we get the previous occurrence.  When we run the query, you can see in the column Prev_Days_Run_Time, the value is the previous days duration.  Just follow the arrows.

We partition by Package_Name which is the grouping we want to analyze, and then we order by package_name and starttime.  This gives us the correct order for our comparison.

e2

 

Now we need to get the difference in the current days duration compared to the previous days.  For that we need to use the DATEDIFF function.  This allows us to subtract the Previous Days values from the current days duration.  We are going to look for the difference in Minutes, so we use the MI parameter.

select package_name,starttime,duration,
lag(duration,1,null) over (partition by package_name order by package_name,starttime,duration) as Prev_Days_Run_Time,
datediff(mi,lag(duration,1,null) over (partition by package_name order by package_name,starttime,duration),duration ) as Diff_Minutes_Between_RunTimes
from [dbo].Warehouse_ETL_Stats   where starttime>=getdate()-4
order by 1 ,2 desc

 

When we run this query, we can see that the first row ran 18 minutes slower than it did the day before and the day before that, it ran 6 minutes slower than the day before that.

e3

 

Analytic functions are very powerful and can really simplify query creation.  They usually also run faster than a complicated query that you would need to write to get similar results.  In a future post, I will show you statistics proving this.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s