Tuesday, January 27, 2009

Analytic Function: Lag

Lag: Analytic function to retrieve a value from the previous row.

Usage:

Lag([column]) OVER (PARTITION BY [partition_column] ORDER BY [ordering_column])


column - column to be pulled in from the previous row
ordering_column - column by which a partition will be ordered
partition_column - column to be used to partition data. On first thought it might seem that this column can as well be included in the ordering_column list. On careful inspection one may notice that doing so can result in incorrect data. Suppose the requirement is not just to pull data from any previous row; rather only rows belonging to the same class as the current row; in such a case including the partition_column under ordering_column will produce an incorrect result.

Eg: See below example: Fetching transaction amounts for customers over a two months: Here the data needs to be partitioned by the customer and then the previous row may be pulled in.


CREATE TABLE tmp828cust(month date, custno number, cust_name varchar2(50))
CREATE TABLE tmp828transact(month date, transactno number, custno number, amount number)

Required output:
CUSTNO CUST_NAME TOTAL-MONTH1 TOTAL-MONTH2
1 A 5000 5500
2 B 1000 500
3 C 2000 1800



Inefficient Solution

SELECT t1.custno,
t1.cust_name,
t1.tot,
t2.tot
FROM
(
SELECT c.custno,
c.cust_name,
Sum(t.amount) tot
FROM tmp828cust c,
tmp828transact t
WHERE c.month = t.month
AND c.custno = t.custno
AND t.MONTH = '01-JAN-2008'
GROUP BY c.custno, c.cust_name
)t1,
(
SELECT c.custno,
c.cust_name,
Sum(t.amount) tot
FROM tmp828cust c,
tmp828transact t
WHERE c.month = t.month
AND c.custno = t.custno
AND t.MONTH = '01-FEB-2008'
GROUP BY c.custno, c.cust_name
)t2
WHERE t1.custno = t2.custno




Better Solution

SELECT *
FROM (SELECT custno,
cust_name,
month,
amt,
Lag(amt) OVER (PARTITION BY custno ORDER BY month) last_amt
FROM (SELECT t.month,
c.custno,
c.cust_name,
Sum(t.amount) amt
FROM tmp828cust c,
tmp828transact t
WHERE c.month = t.month
AND c.custno = t.custno
GROUP BY t.month,
c.custno,
c.cust_name
)
)
WHERE last_amt IS NOT NULL


Notice the usage of Lag function here
Lag(amt) OVER (PARTITION BY custno ORDER BY month) : Get the amt column from the previous row, rows should be Partitioned by custno and Ordered by month

References:
1. http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1396477600346947904
2. http://www.adp-gmbh.ch/ora/sql/analytical/lag.html

Links