All Forums Database
indrajit_td 50 posts Joined 10/09
28 May 2013
Transposing the data - QUERY

Hello,
We have come across a scenario where we need to update the previous records in a target table based on if the value is NULL upto its last NON NULL occurence. Here is the test scenario we have tried to mock up:
-- Definition of target table
Create table test_target
(ID integer,
 update_time timestamp(6),
 change_log_1 varchar(10),
 change_log_2 varchar(10),
 change_log_3 varchar(10),
 change_log_4 varchar(10),
 change_log_5 varchar(10),
 start_date timestamp(6),
 end_date timestamp(6)
) PRIMARY INDEX (ID);
-- Test values - Existing Target table shows that there are many NULL values
INSERT INTO test_target values (1234,'2013-01-01 00:00.000000','abc','def',NULL,NULL,NULL,'2013-01-01 00:00.000000','2013-02-01 02:00.000000');
INSERT INTO test_target values (1234,'2013-02-01 02:00.000000',NULL,NULL,'cdf','fgd','ref','2013-02-01 02:00.000000','2013-03-01 03:00.000000');
INSERT INTO test_target values (1234,'2013-03-01 03:00.000000',NULL,NULL,'fgr','lkj',NULL,'2013-03-01 03:00.000000','2013-04-01 04:00.000000');
INSERT INTO test_target values (1234,'2013-04-01 04:00.000000','cfg',NULL,'tyh','poi','fdt','2013-04-01 04:00.000000','2013-05-01 05:00.000000');
INSERT INTO test_target values (1234,'2013-05-01 05:00.000000',NULL,NULL,'trd',NULL,NULL,'2013-05-01 05:00.000000','2013-06-01 06:00.000000');
INSERT INTO test_target values (1234,'2013-06-01 06:00.000000','erf',NULL,'hjk',NULL,'iuf','2013-06-01 06:00.000000','2013-07-01 07:00.000000');
INSERT INTO test_target values (1234,'2013-07-01 07:00.000000',NULL,NULL,NULL,'gdf',NULL,'2013-07-01 07:00.000000','3999-12-31 23:99:999999');
-- Load table definition
Create test_load
(ID integer,
 update_time timestamp(6),
 columnname varchar(30),
 change_from varchar(10),
 change_to varchar(10)
) PRIMARY INDEX (ID);
-- The delta coming in overrides what information was previously sent.
-- It shows columns were changed from a value other than NULL which means the earlier information that was sent
-- is now updated. For e.g.change_col_2 was changed from lks to fgc but earlier it had NULL value
INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_1','frs','txh');
INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_2','lks','fgc');
INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_4','gdf','fdr');
INSERT INTO test_load values (1234,'2013-08-01 08:00.000000','change_col_5','ytf','tjb');
-- Another definition for the target table this is ideally the same table but it has been created only to see
-- what results are expected. This is ideally the same table test_target
-- NULLs in the history should be updated to the change_from values (but only upto a point where the previous value is NOT NULL. See the difference between change_col_1 & change_col_2)
Create table test_target_expected_res
(ID integer,
 update_time timestamp(6),
 change_log_1 varchar(10),
 change_log_2 varchar(10),
 change_log_3 varchar(10),
 change_log_4 varchar(10),
 change_log_5 varchar(10),
 start_date timestamp(6),
 end_date timestamp(6)
) PRIMARY INDEX (ID);
INSERT INTO test_target values (1234,'2013-01-01 00:00.000000','abc','def',NULL,NULL,NULL,'2013-01-01 00:00.000000','2013-02-01 02:00.000000');
INSERT INTO test_target values (1234,'2013-02-01 02:00.000000',NULL,'lks','cdf','fgd','ref','2013-02-01 02:00.000000','2013-03-01 03:00.000000');
INSERT INTO test_target values (1234,'2013-03-01 03:00.000000',NULL,'lks','fgr','lkj',NULL,'2013-03-01 03:00.000000','2013-04-01 04:00.000000');
INSERT INTO test_target values (1234,'2013-04-01 04:00.000000','cfg','lks','tyh','poi','fdt','2013-04- 01 04:00.000000','2013-05-01 05:00.000000');
INSERT INTO test_target values (1234,'2013-05-01 05:00.000000',NULL,'lks','trd',NULL,NULL,'2013-05-01 05:00.000000','2013-06-01 06:00.000000');
INSERT INTO test_target values (1234,'2013-06-01 06:00.000000','erf','lks','hjk',NULL,'iuf','2013-06-01 06:00.000000','2013-07-01 07:00.000000');
INSERT INTO test_target values (1234,'2013-07-01 07:00.000000','frs','lks',NULL,'gdf','ytf','2013-07-01 07:00.000000','2013-08-01 08:00.000000');
INSERT INTO test_target values (1234,'2013-08-01 08:00.000000','txh','fgc',NULL,'fdr','tjb','2013-08-01 08:00.000000','3999-12-31 23:99:999999');
We are trying to achieve what we get in the table test_target_expected_res using a single query or even multiple passes.

 
Thank You,
Indrajit
 
 

ulrich 816 posts Joined 09/09
29 May 2013

Hi,
your timestamps values are not valid. As you expect specific results it would be good if you can review the data.
timestamp need to be of format 'YYYY-MM-DD HH:Mi:SS.ssssss' - I guess you missed the Hour? And you can't have more then 59.999999 secs...

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

indrajit_td 50 posts Joined 10/09
30 May 2013

-- STAGE TABLE
CREATE TABLE stage_tbl
(ID INTEGER,
change_id INTEGER,
update_time TIMESTAMP(6),
columnname VARCHAR(30),
change_log_from VARCHAR(10),
change_log_to VARCHAR(10)
) PRIMARY INDEX (ID);

INSERT INTO stage_tbl VALUES (1234,5678,'2013-07-01 07:00.00.000000','change_col_1','','erf');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 08:00.00.000000','change_col_2','lks','txh');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_1','erf','fdr');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_2','txh','rep');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_3','','fdy');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_4','rfd','uif');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 09:00.00.000000','change_col_5','lop','jhr');
INSERT INTO stage_tbl VALUES (1234,5678,'2013-08-01 11:00.00.000000','change_col_5','jhr','puh');
INSERT INTO stage_tbl VALUES (1234,78965,'2013-07-01 07:00.00.000000','change_col_1','rdt','ted');
INSERT INTO stage_tbl VALUES (1234,78965,'2013-08-01 08:00.00.000000','change_col_2','wfs','trh');
INSERT INTO stage_tbl VALUES (1234,78965,'2013-08-01 09:00.00.000000','change_col_3','','');
INSERT INTO stage_tbl VALUES (1234,23654,'2013-08-01 08:00.00.000000','change_col_4','gfd','');
INSERT INTO stage_tbl VALUES (1234,32111,'2013-08-01 09:00.00.000000','change_col_5','red','');

-- WRITE A QUERY ON STAGE TABLE SO THAT WE GET THE RESULTS AS IN expected_res below:

CREATE TABLE expected_res
(ID INTEGER,
change_id INTEGER,
update_time TIMESTAMP(6),
change_col_1 VARCHAR(10),
change_col_2 VARCHAR(10),
change_col_3 VARCHAR(10),
change_col_4 VARCHAR(10),
change_col_5 VARCHAR(10)
) PRIMARY INDEX (ID);

INSERT INTO expected_res VALUES (1234,5678,'2013-07-01 07:00.00.000000','erf','lks',NULL,'rfd','lop');
INSERT INTO expected_res VALUES (1234,5678,'2013-08-01 08:00.00.000000','erf','txh',NULL,'rfd','lop');
INSERT INTO expected_res VALUES (1234,5678,'2013-08-01 09:00.00.000000','fdr','rep','fdy','uif','jhr');
INSERT INTO expected_res VALUES (1234,5678,'2013-08-01 11:00.00.000000','fdr','rep','fdy','uif','puh');
INSERT INTO expected_res VALUES (1234,78965,'2013-07-01 07:00.00.000000','ted','wfs',NULL,NULL,NULL);
INSERT INTO expected_res VALUES (1234,78965,'2013-08-01 08:00.00.000000','ted','trh',NULL,NULL,NULL);
INSERT INTO expected_res VALUES (1234,78965,'2013-08-01 09:00.00.000000','ted','trh',NULL,NULL,NULL);
INSERT INTO expected_res VALUES (1234,23654,'2013-08-01 08:00.00.000000',NULL,NULL,NULL,'gfd',NULL);
INSERT INTO expected_res VALUES (1234,32111,'2013-08-01 09:00.00.000000',NULL,NULL,NULL,NULL,'red');

LOGIC:
- The row in the target table should be at ID/Change_Id/update_time level
- If the value for a particluar column doesnt come in the following delta then we should take the previous value for e.g. 1234 5678 2013-07-01 07:00.00.000000
we got the value for change_column_1 as erf
but in the next days run i.e. 2013-08-01 08:00.00.000000 there was no value for change_column_1 as it had no change so we need to take the previuos days value.

ulrich 816 posts Joined 09/09
30 May 2013

Are you looking for something like 

select ID, 
       change_id, 
       update_time,
       substr(max(case when columnname = 'change_col_1' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col1,       
       substr(max(case when columnname = 'change_col_2' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col2,       
       substr(max(case when columnname = 'change_col_3' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col3,       
       substr(max(case when columnname = 'change_col_4' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col4,      
       substr(max(case when columnname = 'change_col_5' then cast(update_time as char(30)) ||change_log_to else null end) over (partition by ID,change_id order by update_time, columnname rows between unbounded preceding and current row ),31) as last_val_col5      
from stage_tbl
qualify row_number() over (partition by id, change_id, update_time order by columnname desc ) = 1

Its not 100% as of your expected results but it is also unclear to me how
INSERT INTO expected_res VALUES (1234,5678,'2013-07-01 07:00.00.000000','erf','lks',NULL,'rfd','lop');
should be generated from your stg data...

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

indrajit_td 50 posts Joined 10/09
30 May 2013

Thank you.

This is generated as the value in change_to column is erf for change_col_1 for date '2013-07-01 07:00.00.000000'
- change_col_2 got changed on 2013-08-01 08:00.00.000000 from lks to txh so its value earlier to 2013-08-01 08:00.00.000000 should be lks
- change_col_3 got changed on 2013-08-01 08:00.00.000000 from SPACE(NO VALUE should be treated as NULL) to fdy so its value earlier to 2013-08-01 09:00.00.000000
should be NULL
- same as change_col_2 for change_col_4 & change_col_5

ulrich 816 posts Joined 09/09
30 May 2013

you could do the same with min and the change_log_from column 
build two derived tables (one with the max one with the min) 
and join the results together and coalesce the max, min values. 

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

indrajit_td 50 posts Joined 10/09
04 Jun 2013

@ulrich - Thank you

You must sign in to leave a comment.