All Forums General
joquinn238 4 posts Joined 02/12
16 Mar 2012
NEED HELP !! - Merging Consecutive and OVERLAPPING date spans

I have been struggling for about 6 months trying to find a teradata query that will combine consecutive date spans as well as identify overlapping date spans and then combine them into one span..and when there is a gap in spans it will create a new row and show that span and continue etc..so that in the end it will show consecutive and overlapping date spans as one row per span....I am new to Teradata so I will need some specific help in writing this query..Hopefully someone here has already done this and will be able to share the query with me....I do appreciate your help....

 

John

ulrich 816 posts Joined 09/09
16 Mar 2012

can you share a ddl on which your problem is based?

 

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

ulrich 816 posts Joined 09/09
16 Mar 2012

to give you an idea - there might be more efficient way do do it...

generate some test data

create table date_range _test as (select calendar_Date from_dt , day_of _week + calendar_date to_dt from sys_calendar.calendar where calendar_date between date - 90 and date and random(1,10) < 3) with data primary index (from_dt) 

this would be

from_dt	to_dt
2011-12-18          	2011-12-19          
2011-12-21          	2011-12-25          
2011-12-31          	2012-01-07          
2012-01-06          	2012-01-12          
2012-01-11          	2012-01-15          
2012-01-16          	2012-01-18          
2012-01-17          	2012-01-20          
2012-01-21          	2012-01-28          
2012-01-23          	2012-01-25          
2012-01-29          	2012-01-30          
2012-02-01          	2012-02-05          
2012-02-06          	2012-02-08          
2012-02-09          	2012-02-14          
2012-02-13          	2012-02-15          
2012-02-17          	2012-02-23          
2012-02-20          	2012-02-22          
2012-03-06          	2012-03-09          
2012-03-07          	2012-03-11          
2012-03-08          	2012-03-13          

will be interpreted as periods where to_dt is not included in the period (as in the period data type)

this SQL should do the trick on 13.10

select range, period(min(day_date), max(day_date)+1)
from
(
select day_date, sum(range_start_flag) over (order by day_date rows between unbounded preceding and current row) as range
from
(
select day_date, case when day_date-1 > min(day_date) over (order by day_date rows between 1 preceding and 1 preceding) then 1 when min(day_date) over (order by day_date rows between 1 preceding and 1 preceding) is null then 1 else 0 end as range_start_flag
from
(
select period(from_dt,to_dt) as date_period,begin(tmp) as day_date 
from date_range_test 
expand on date_period as tmp
) as t1
) as t2 
) as t3
group by 1
order by 1

result is - sorry for the different date format, but didn't figured out how to change this right now. Again, period end is excluded...

range    combined_period
1    ('11/12/18', '11/12/19')                                    
2    ('11/12/21', '11/12/25')                                    
3    ('11/12/31', '12/01/15')                                    
4    ('12/01/16', '12/01/20')                                    
5    ('12/01/21', '12/01/28')                                    
6    ('12/01/29', '12/01/30')                                    
7    ('12/02/01', '12/02/05')                                    
8    ('12/02/06', '12/02/08')                                    
9    ('12/02/09', '12/02/15')                                    
10    ('12/02/17', '12/02/23')                                    
11    ('12/03/06', '12/03/13')                                    
 

 

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

joquinn238 4 posts Joined 02/12
16 Mar 2012

This is awesome..This is the ouput I got

1 ('2002-12-01', '2003-08-31')
2 ('2003-09-01', '2003-12-31')
3 ('2004-02-01', '2004-02-29')
4 ('2005-07-01', '2005-09-30')
5 ('2005-10-01', '2005-10-31')
6 ('2005-11-01', '2006-02-28')
7 ('2006-03-01', '2006-07-31')
8 ('2006-08-01', '2006-09-30')
9 ('2006-11-01', '2007-09-30')
10 ('2007-10-01', '2008-06-30')
11 ('2008-07-01', '2008-09-30')
12 ('2008-10-01', '2009-02-28')
13 ('2009-03-01', '2009-05-31')
14 ('2009-06-01', '2010-07-31')
15 ('2010-08-01', '2010-12-31')
16 ('2011-03-01', '9999-12-31')

However I need it to consolidate the consecutive spans and break and start a new row when the spans break

and I need the output in 2 seperate columns .....for instance....

EFFECTIVE      ENDDATE
12/1/2002      12/31/2003
2/1/2004        2/29/2004
7/1/2005        9/30/2006
11/1/2006      12/31/2010
3/1/2011        12/31/9999
 

If you can revise so that my output is consolidated and in two columns...we have a WINNER !!!

 

Thank you so much

 

dnoeth 4628 posts Joined 11/04
16 Mar 2012

Hi Ulrich/John,

if you're on TD13.10 instead of EXPAND better use one of those hardly known, but quite powerful new functions. Then it's really easy :-)

WITH cte(grpcol,pd) AS
 (
   SELECT grpcol, PERIOD(from_dt,to_dt) 
   FROM date_range_test
 )
SELECT *
FROM TABLE
 ( TD_SYSFNLIB.TD_NORMALIZE_OVERLAP_MEET(NEW VARIANT_TYPE(cte.grpcol), cte.pd)
   RETURNS (grpcol INT, pd PERIOD(DATE), cnt INT)
   HASH BY grpcol
   LOCAL ORDER BY grpcol, pd
 ) AS dt
ORDER BY 1,2;

If you're on an older release you could do it with some nested OLAP functions, too, e.g.:

http://www.orafaq.com/usenet/comp.databases.theory/2004/12/04/0111.htm

There's a more current/efficient solution, too, but i couldn't find it right now...

Dieter

Dieter

dnoeth 4628 posts Joined 11/04
16 Mar 2012

Hi John,

based on your previous post you're on TD13.10, i just noticed that :-)

Periods include the starting date, but exclude the end date, you probably have to use PERIOD(from_dt, to_dt + 1).

And BEGIN(pd), LAST(pd) to get seperate columns again.

Dieter

Dieter

ulrich 816 posts Joined 09/09
19 Mar 2012

Thanks Dieter,

these had been on my list to explore for some time. I am impressed.

Run this on a 1.3 mio row table.

The normed cpu is 1/3 of the solution I outlined. And the ellapsed time was even faster - on a single user machine.

The syntax need some time to get used to it but seems to be definitly worth.

Thanks again!

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

dnoeth 4628 posts Joined 11/04
19 Mar 2012

Hi Ulrich,
yep, these functions are great, yet they're hardly known :-)

And the manuals don't provide any details.

When you get rid of the EXPAND (which might result in a potentially huge intermediate spool) your solution will be more efficient, too.

This is the solution i was talking about, which also runs pre-TD13, and performance is quite good: 

SELECT grpcol, from_dt, 
  COALESCE(MAX(x)
           OVER (PARTITION BY grpcol  
                 ORDER BY from_dt
                 ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING), x2)
FROM
 (
   SELECT grpcol, from_dt, 
     MAX(to_dt) 
     OVER (PARTITION BY grpcol 
           ORDER BY from_dt,to_dt
           ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS x,
     MAX(to_dt) 
     OVER (PARTITION BY grpcol) AS x2
   FROM date_range_test
   QUALIFY from_dt > x OR x IS NULL
 ) AS dt

Btw, using those period functions: when to_dt must be included it's not only PERIOD(from_dt, to_dt + 1) and LAST(pd) , the function must change to TD_NORMALIZE_OVERLAP, too.

Dieter

 

Dieter

JimmyLee 13 posts Joined 06/10
19 Jun 2012

I'm having problems when trying to use the normalize funcitons with varchar grouping columns.

Given this table:

CREATE SET TABLE JLEE80.cte_base

(

Pstncd VARCHAR(8) CHARACTER SET LATIN NOT CASESPECIFIC,

EffPer PERIOD(DATE))

PRIMARY INDEX ( Pstncd )

;

This code generates an error ("The output grouping column list does not match with input grouping list):

WITH cte(pstncd, effper) AS

(

SELECT pstncd , effper

FROM cte_base

)

SELECT *

FROM TABLE

( TD_SYSFNLIB.TD_NORMALIZE_OVERLAP_meet(new Variant_type(cte.pstncd), cte.effper)

RETURNS (pstncd varchar(8), effper period (date))

HASH BY pstncd

LOCAL ORDER BY pstncd, effper

) AS DT

;

However, this code runs fine:

WITH cte(pstncd, effper) AS

(

SELECT cast(pstncd as int) , effper

FROM cte_base

)

SELECT *

FROM TABLE

( TD_SYSFNLIB.TD_NORMALIZE_OVERLAP_meet(new Variant_type(cte.pstncd), cte.effper)

RETURNS (pstncd int, effper period (date))

HASH BY pstncd

LOCAL ORDER BY pstncd, effper

) AS DT

;

Any clues as to what is happening?  Do the period normalize functions not support varchar grouping columns?

Thanks.

"To not give your best is to sacrifice the gift" -- Steve Prefontaine

zskuza 15 posts Joined 04/11
16 Mar 2015

Dieter, I have a related problem where I'm attempting to aggregate the counts within the following rows where their dates overlap:

ID Count StartDate EndDate

1 1 31/10/2006 00:00:00 31/12/9999 23:59:59

1 1 17/08/2009 00:00:00 31/12/9999 23:59:59

 

Into the following output:

ID Count StartDate EndDate

1 1 31/10/2006 00:00:00 16/08/2009 23:59:59

1 2 17/08/2009 00:00:00 31/12/9999 23:59:59

 

Can you assist?

dnoeth 4628 posts Joined 11/04
17 Mar 2015

Hard to tell with two example rows :)

SELECT
   id, count, StartDate, 
   COALESCE(MIN(StartDate) 
            OVER (PARTITION BY id
                  ORDER BY StartDate, EndDate
                  ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) - INTERVAL '1' SECOND
           ,EndDate)
FROM tab

 

Dieter

zskuza 15 posts Joined 04/11
17 Mar 2015

Sorry, was hoping 2 records would be enough.
Anyway the query you provided is nearly right. The count value for the first and second record need to be summed for the period during which they overlap. I tried changing count to SUM(count) but then get an error that ordered analytical functions can't be used in GROUP BY...

SaGo 2 posts Joined 03/15
18 Mar 2015

Hi,
Can anyone help me in this?
There is a scenario where we have records with overlapping dates and we need to split the records based on dates, considering the recently updated record. 
Consider the below examples -

 

Scenario 1:

Input

Source  Destination     Begin_Date  End_Date     Last_Updated      Value

ABC      123                 3/15/2015    12/15/2015  3/17/2015 9:18    50

ABC      123                 4/15/2015    5/15/2015    3/18/2015 2:31    44

 

Ouput

Source  Destination     Begin_Date  End_Date      Value

ABC       123                3/15/2015    4/14/2015       50

ABC       123                4/15/2015    5/15/2015       44

ABC       123                5/16/2015    12/15/2015     50

 

Scenario 2:

Input

Source  Destination     Begin_Date  End_Date      Last_Updated    Value

ABC       123                 3/15/2015    12/15/2015  3/17/2015 9:18    50

ABC       123                 4/15/2015    12/15/2015  3/18/2015 2:31    44

 

Ouput

Source  Destination     Begin_Date  End_Date    Value

ABC       123                3/15/2015    4/14/2015       50

ABC       123                4/15/2015    12/15/2015     44

 

Scenario 3:

Input

Source   Destination    Begin_Date   End_Date      Last_Updated     Value

ABC        123                4/15/2015    12/15/2015   3/17/2015 9:18   50

ABC        123                3/15/2015    12/15/2015   3/18/2015 2:31   44

 

Ouput

Source   Destination    Begin_Date    End_Date     Value

ABC        123                3/15/2015     12/15/2015     50

 

 

Thanks!
 

 

Thanks!
SaGo

zskuza 15 posts Joined 04/11
19 Mar 2015

SaGo,
For Scenario 2, this code should do the trick:

SELECT

Source

,Destination

,Begin_Date

,COALESCE(MIN(Begin_Date) OVER (PARTITION BY Source, Destination ORDER BY Begin_Date ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) - 1, End_Date) AS End_Date

,End_Date AS Orig_End_Date

FROM tbl

 

I am currently using something like this to cleanse a table with overlapping dates. The original logic came from Dieter, of course :-)

 

Scenario 1 could potentially be done with a self-referencing left join and OVERLAPS.

 

Scenario 3 I find strange. You want to report the value of 50 for a period of time when it wasn't even valid?

SaGo 2 posts Joined 03/15
19 Mar 2015

Thanks for your reply! 
Sorry.. For scenario 3, value should be 44. It was typo :(
I found logic for these scenarios. But now, need to catch the bewlo scenario:
Scenario:
Input
Source   Destination    Begin_Date   End_Date      Last_Updated     Value
ABC        123                4/20/2015    12/12/2015   3/16/2015 8:10   60 (oldest)
ABC        123                4/15/2015    12/15/2015   3/17/2015 9:18   50 
ABC        123                4/15/2015     5/15/2015    3/18/2015 2:31   44 
ABC        123               11/15/2015   12/15/2015   3/19/2015 2:31   33 (latest)
Ouput
Source   Destination    Begin_Date    End_Date     Value
ABC        123                4/15/2015     5/15/2015     44
ABC        123               11/15/2015   12/15/2015    33
ABC        123                5/16/2015    11/14/2015    50 (gap period in above records which should take value from recent record having the gap period i.e.50)

Thanks!
SaGo

zskuza 15 posts Joined 04/11
22 Mar 2015

SaGo, I've had a bit of a look through the SQL reference docs and found that LDIFF and RDIFF might be the right approach for this problem.
Here's some code I've been playing with just now:

SELECT

PERIOD(A.Begin_Date, A.End_Date) LDIFF PERIOD(B.Begin_Date, B.End_Date) AS LeftDiff

,PERIOD(A.Begin_Date, A.End_Date) RDIFF PERIOD(B.Begin_Date, B.End_Date) AS RightDiff

,A.*, B.*

FROM tbl A

LEFT JOIN tbl B

ON A.Source = B.Source

AND A.Destination = B.Destination

AND A.Last_Upd > B.Last_Upd

AND (PERIOD(A.Begin_Date, A.End_Date) LDIFF PERIOD(B.Begin_Date, B.End_Date) IS NOT NULL

OR PERIOD(A.Begin_Date, A.End_Date) RDIFF PERIOD(B.Begin_Date, B.End_Date) IS NOT NULL

OR PERIOD(A.Begin_Date, A.End_Date) OVERLAPS (B.Begin_Date, B.End_Date))

 

TDScott 5 posts Joined 09/14
31 Aug 2015

This forum example was very helpful, thanks.  
Here is what I ended up doing based on this, in case it helps anyone:


--drop table LAB.date_gap
-- create range table
CREATE MULTISET TABLE LAB.date_gap, NO JOURNAL, NO FALLBACK AS
(
   select TRIM(UPPER(f.unq_id)) as unique_id, period(cast(f.cID_strt_dts as date), cast(f.cID_END_DTS as date) + 2) as duration_period
            from
            PKG.Date_FACT f
            where cast(f.cID_strt_dts as date) < cast(f.cID_END_DTS as date)
                  and f.cID_strt_dts is not null and f.cID_END_DTS is not null
)WITH  DATA PRIMARY INDEX(unique_id);



--drop table LAB.date_gap2
-- compress ranges into seperate table
CREATE MULTISET TABLE LAB.date_gap2, NO JOURNAL, NO FALLBACK AS
(
         WITH subtbl(unique_id,duration_period) AS
      (   
        SELECT unique_id, duration_period 
   FROM LAB.date_gap
   )
   SELECT *
   FROM TABLE (TD_SYSFNLIB.TD_NORMALIZE_OVERLAP_MEET(NEW VARIANT_TYPE(subtbl.unique_id),
                                                     subtbl.duration_period)
   RETURNS (unique_id VARCHAR(30), duration_period PERIOD(DATE), NrmCount INT)
   HASH BY unique_id     
   LOCAL ORDER BY unique_id, duration_period)     
   AS DT(unique_id, duration_period, NrmCount) 
   ) WITH  DATA PRIMARY INDEX(unique_id);


  -- find gap counts overall, count of 1 is no gaps
   select count(*), gapcount from (
   select  unique_id, COUNT(*) as gapcount from LAB.date_gap2 
    GROUP BY unique_id ) a group by gapcount


--drop table LAB.date_gap3
-- based on those that have 1 gap (as per my example majority are there) create way to see the gaps ranges on those
CREATE MULTISET TABLE LAB.date_gap3, NO JOURNAL, NO FALLBACK AS
(
select unique_id,
    max(case when seq =1 then end(duration_period) end) datedur_period1,
    max(case when seq =2 then begin(duration_period) end) datedur_period2

from
(
      SELECT unique_id, duration_period,
      row_number() over(partition by unique_id order by duration_period asc) seq
FROM
LAB.date_gap2    
) d
group by unique_id
having count(*) = 2
   ) WITH  DATA PRIMARY INDEX(unique_id);


-- and what are the actual gaps grouped by year buckets, you could do weeks/etc. probably a better way.
select sum(countss), 
case 
when difference > 2520 then 'greater than 7yr' 
when difference < 2520 and difference > 2160 then '7 year'
when difference < 2160 and difference > 1800 then '6 year' 
when difference < 1800 and difference > 1440 then '5 year'
when difference < 1440 and difference > 1080 then '4 year'
when difference < 1080 and difference > 720 then '3 year'
when difference < 720 and difference > 360 then '2 year'
when difference < 360 then 'less than 1 year' end from (
select  count(*) countss, datedur_period2 - datedur_period1 as difference from LAB.date_gap3 
group by 2 
) b group by case 
when difference > 2520 then 'greater than 7yr' 
when difference < 2520 and difference > 2160 then '7 year'
when difference < 2160 and difference > 1800 then '6 year' 
when difference < 1800 and difference > 1440 then '5 year'
when difference < 1440 and difference > 1080 then '4 year'
when difference < 1080 and difference > 720 then '3 year'
when difference < 720 and difference > 360 then '2 year'
when difference < 360 then 'less than 1 year' end 


Enjoy!  Scott

TDScott 5 posts Joined 09/14
31 Aug 2015

whoops, minor tweak to previous post, don't know how to edit so following up here, my date ranges at the end are bad.  2555 etc... 

jainayush007 50 posts Joined 03/11
10 Apr 2016

Hi Dieter, Do you have fix the error - ""The output grouping column list does not match with input grouping list"
I get the same error as well.My grpcol is varchar types. I did try to change the below as well but didnt work-
RETURNS (grpcol VARCHAR, pd PERIOD(DATE), cnt INT)
 
Thanks.

jainayush007 50 posts Joined 03/11
10 Apr 2016

I think, only INT works with it. How can I use it in a view? Or store its result in a table? Or use it as part of a larger query.

dnoeth 4628 posts Joined 11/04
10 Apr 2016

It's working for other data types, too, but you have to define them accordingly.
 
For a VarChar it must be a matching definition including the chracter set:
RETURNS (grpcol VARCHAR(n) CHARACTER SET LATIN
 
If it's UNICODE you must double the defined size in RETURN:
RETURNS(core_target_group_cd VARCHAR(2*n) CHARACTER SET UNICODE

Better don't ask why :)
 
And you can use it like any other Derived Table...

Dieter

jainayush007 50 posts Joined 03/11
11 Apr 2016

Hi Dieter, Thanks for your response.
Not able to execute this as a derived table-
sel distinct pstncd from(
WITH cte(pstncd, effper) AS
(
SELECT cast(pstncd as int) , effper
FROM cte_base
)
SELECT *
FROM TABLE
( TD_SYSFNLIB.TD_NORMALIZE_OVERLAP_meet(new Variant_type(cte.pstncd), cte.effper)
RETURNS (pstncd int, effper period (date))
HASH BY pstncd
LOCAL ORDER BY pstncd, effper
) AS DT
) A;

dnoeth 4628 posts Joined 11/04
11 Apr 2016

You can't nest a Common Table Expression (cte = WITH) in a Derived Table (dt = FROM (SELECT)).

WITH cte(pstncd, effper) AS
(
SELECT CAST(pstncd AS INT) , effper
FROM cte_base
)
SELECT  DISTINCT pstncd 
FROM TABLE
( TD_SYSFNLIB.TD_NORMALIZE_OVERLAP_MEET(NEW VARIANT_TYPE(cte.pstncd), cte.effper)
RETURNS (pstncd INT, effper PERIOD (DATE))
HASH BY pstncd
LOCAL ORDER BY pstncd, effper
) AS DT
;

 

Dieter

jainayush007 50 posts Joined 03/11
11 Apr 2016

Can I execute cte as a view? I have done it for recursive view. How do I do for this syntax?

dnoeth 4628 posts Joined 11/04
11 Apr 2016

You can't create a view with a CTE, don't know why.
 
 
You can materialize the result using

INSERT INTO target -- or  CREATE TABLE newtab AS (
WITH cte...

 
If you're on TD14.10 you might have a look at SELECT NORMALIZE, which can be used in a View:
http://www.info.teradata.com/HTMLPubs/DB_TTU_15_10/SQL_Reference/B035_1146_151K/ch01.021.056.html

Dieter

asmi1812 1 post Joined 01/13
11 May 2016

I need some help on merging of overlapping and consecutive date range. After merging I need to update the EFF_DT and EXP_DT if the membership dates of current and  previous row  or current and next row overlaps. Here Membership E dates overides P and G and membership P dates overrides G.


Source Data
Key1	Membership	EFF_DT	EXP_DT
YW82792	P  	9/25/2000	2/28/2002
YW82792	P  	3/1/2002	2/28/2003
YW82792	G  	3/1/2003	2/29/2004
YW82792	P  	10/30/2006	2/29/2008
YW82792	G  	3/1/2008	2/28/2009
YW82792	P  	10/26/2010	2/29/2012
YW82792	P  	12/16/2011	2/28/2013
YW82792	G  	1/8/2013	2/28/2014
YW82792	P  	5/16/2014	9/15/2014




Expected Output
Key1	Membership	EFF_DT	EXP_DT
YW82792	P  	9/25/2000	2/28/2003
YW82792	G  	3/1/2003	2/29/2004
YW82792	P  	10/30/2006	2/29/2008
YW82792	G  	3/1/2008	2/28/2009
YW82792	P  	10/26/2010	2/28/2013
YW82792	G  	3/1/2013	2/28/2014
YW82792	P  	5/16/2014	9/15/2014

 
I do appreciate your help.

You must sign in to leave a comment.