All Forums Database
CrayRay 4 posts Joined 04/16
03 Apr 2016
Recursive Row Function

Hi guys,
I have a situation as below:
a    1    0.1
a    2    0
a    3    0
b    1    0.1
b    2    0.2
b    3    0
 
Data is partitioned by Col1, sorted by Col2
I want to create column 4 with the rules below:

  • For a given value in Col1: If row 2 col3 value < row 1 col3 value, then average it, else use row 2 col3 value

So at the end the outcome should look like this:
 
a   1   0.1   0.1
a   2   0      0.05
a   3   0      0.025
b   1   0.1   0.1
b   2   0.2   0.2
b   3   0      0.1
 
Thanks!
 
Kind Regards,
Raymond
 

CrayRay 4 posts Joined 04/16
03 Apr 2016

Updated Rules (to make it clearer):

  1. If first row of new Col1, then Col4 = Col3
  2. If not first row,
    • If Col3 < PrevRow_Col4 then Col4 = (Col3 + PrevRow_Col4) / 2
    • else Col4 = Col3
  3. Next row.

 

dnoeth 4628 posts Joined 11/04
03 Apr 2016

Hi Raymond,
you probably need recursion for this:

WITH RECURSIVE prev AS
 (
   SELECT tab.*, col3 AS col4
   FROM tab 
   WHERE col2 = 1
   
   UNION ALL
   
   SELECT t.*,
      CASE WHEN t.col3 < prev.col4 THEN (t.col3 + prev.col4)/2 ELSE t.col3 END
   FROM tab AS t 
   JOIN prev
     ON t.col1= prev.col1
    AND t.col2 = prev.col2+1
 )
SELECT * 
FROM prev 

 
Assuming you simplified the actual data, if col2 is not a sequence you must materialize a ROW_NUMBER in a Volatile Table:

CREATE VOLATILE TABLE vt AS
 ( SELECT ...
     ,ROW_NUMBER() 
      OVER (PARTITION BY col1
            ORDER BY col2) AS rn
   FROM tab
 )
WITH DATA
PRIMARY INDEX (col1)
ON COMMIT PRESERVE ROWS

and use rn instead of col2.
 
Can you add more details, how many rows per col1 and actual data for col3?
Maybe it's possible with OLAP-functions, too.

Dieter

CrayRay 4 posts Joined 04/16
03 Apr 2016

Col2 is simplified but they are dates in numeric format (ie 20150215) and they increase on a yearly basis (next input will be 20160215)
By that definition, Number of rows per col1 will be increasing every year.
I'll give the code a try. Thanks!

dnoeth 4628 posts Joined 11/04
04 Apr 2016

You could use col2 + 10000, but you will need a FIRST_VALUE or ROW_NUMBER in your seed query, so materializing your data should be the most efficient.

Dieter

You must sign in to leave a comment.