All Forums Database
barthouston 4 posts Joined 04/14
22 Apr 2014
Cumulative Sum by distinct customer ID by year

Hi - I've seen some posts here regarding the SUM OVER function to calculate the cumulative sum.  My problem at hand is slightly different, where I need to keep a running total of distinct purchasers by year:
Year        #Distinct Customers         #Cumulative (Distinct Customers)
2010                  150                                         150
2011                  180                                         310  <-- Note that the count is smaller than just (150+180), because some may be repeat purchasers
2012                  200                                         500  <-- same thing here as above, would like to exclude repeat purchasers
2013                  250                                         750  <-- just so happens that all 250 customers in 2013 were all new
2014                  190                                         805 
 
Could anyone please let me know if this can somehow work with SUM OVER Partition?  Right now I have, but it only gives a simple running total.
SELECT year, count(distinct cust_id) as cust_count,
SUM(cust_count) OVER (PARTITION BY year ORDER BY year ROWS UNBOUNDED PRECEDING ) AS cumul_sum
 
Thanks!
Carey

dnoeth 4628 posts Joined 11/04
23 Apr 2014

Hi Carey,
I've done a similar query, new vs. existing customers for every period :-)
There's no way to do it with a simple SUM OVER as you can't use DISTINCT within an OLAP function.

SELECT yr,
   COUNT(*) AS distinct_customers
  ,SUM(flag) AS new_customers
  ,SUM(SUM(flag)) 
   OVER (ORDER BY yr
         ROWS UNBOUNDED PRECEDING) AS cumulative_new_customers 
FROM
 (
   SELECT
      cust_id
     ,EXTRACT(YEAR FROM order_dt) AS yr
     ,CASE  -- find the first year when a customer placed an order
         WHEN EXTRACT(YEAR FROM order_dt) 
            = MIN(EXTRACT(YEAR FROM order_dt)) 
              OVER (PARTITION BY cust_id) 
         THEN 1 
         ELSE 0 
      END AS flag
   FROM orders
   GROUP BY yr,cust_id
 ) AS dt
GROUP BY 1

 

Dieter

barthouston 4 posts Joined 04/14
23 Apr 2014

Thank you so much, Dieter.  This is exactly what I needed!!

CarlosAL 512 posts Joined 04/08
25 Apr 2014

Hi all:
Sorry for jumping in (and jumping in late)
I'm not sure about requirements but doesn't a scalar subquery do the trick?
SELECT a.year,
       count(distinct a.cust_id) as cust_count,
       (select count(distinct b.cust_id) from mytable b where b.year <= a.year)
from mytable a
group by a.year
order by a.year;
 
Cheers.
Carlos.
 
 

dnoeth 4628 posts Joined 11/04
25 Apr 2014

Hi Carlos,
of course this should work, too.
But the Explain wil be horrible due to the non-equi-Join, including a bad Product Join with a huge intermediate spool. I always try to avoid Scalar Subqueries like this, most of them are easily rewritten using OLAP functions, etc. :-)

Dieter

CarlosAL 512 posts Joined 04/08
28 Apr 2014

Hi Dieter:
Agreed with the horrible query plan. Agreed with the scalar subqueries (I don't like them either).
BUT:
Sometimes the simplicity of the query can be a adequate solution for many users (Not everybody is able to develop a query like yours. "Easily rewritten" is a fuzzy concept here ;-) ).
Cheers.
Carlos.

ymuharram 1 post Joined 04/14
28 Apr 2014

Hi - I am new to Teradata SQL and I have a problem like this. I need to get the LATEST record from each customer.
To get this, I created a derived table, but it doesn't work. Could anyone has the idea please?
Here is my codes:
select top 100*
from CUST_CST_CUAD as CUST
where PERSON_ID=tempo.PERSON_ID and
CUST.DOE=tempo.take
order by PERSON_ID, DOE,
 
(select
PERSON_ID
,max(DOE) as take
from CUST_CST_CUAD
group by PERSON_ID) as tempo;
 
Cheers,
Yusuf
 
 
 

dnoeth 4628 posts Joined 11/04
28 Apr 2014

@Carlos:
you're right, this specific query was not "easily rewritten", but it's also not an easy question :-)
 
@Yusuf:
The Deried Table must be in the FROM clause:

select top 100*
from CUST_CST_CUAD as CUST,
  (select
   PERSON_ID
     ,max(DOE) as take
   from CUST_CST_CUAD
   group by PERSON_ID) as tempo
where PERSON_ID=tempo.PERSON_ID and
CUST.DOE=tempo.take
order by PERSON_ID, DOE

But you can simply use an OLAP function:

select *
from CUST_CST_CUAD as CUST
qualify
   MAX(DOE) 
   OVER (PARTITION BY PERSON_ID) = DOE;

select *
from CUST_CST_CUAD as CUST
qualify
   RANK() 
   OVER (PARTITION BY PERSON_ID
         ORDER BY DOE DESC) = 1;

Both return exactly the same result, there might be more than one row per customer if there are multiole rows for the same max date (If you want only a single row, you must add another column to ORDER BY or switch to ROW_NUMBER instead of RANK)

Dieter

Gannu67 3 posts Joined 02/15
07 Feb 2015

I have below record set


Customer	col1		col2		col3		col4		col5
123	2920292109800000000	352008063830068	6147439366	7/2/2014	VZ
123	2920292109800000000	352008063988684	6147438242	7/2/2014	VZ
123	2920292109800000000	352005063348786	6145379130	7/2/2014	VZ
123	2920292109800000000	352005063125267	6147435989	7/2/2014	VZ
123	2920292109800000000	352008063805045	6147437954	7/2/2014	VZ
123	2920029207100030000	35200006841935	6147437954	6/29/2014	SP
123	2920029207100030000	35200406638684	6147435989	6/29/2014	SP
123	2920029207100030000	35200406629904	6147439979	6/29/2014	SP
123	2920029207100030000	35200006817715	6147438242	6/29/2014	SP
123	2920029207100030000	35200006840965	6147439366	6/29/2014	SP
123	2920029206700020000	352006063253877	6145512232	6/24/2014	VZ
123	2920029206700020000	352006063184395	6145511878	6/24/2014	VZ
123	2920029206700020000	352008064038380	6145511808	6/24/2014	VZ
123	2920029206700020000	352006063342274	6145512162	6/24/2014	VZ
123	2920029206700020000	352006063493580	6145510997	6/24/2014	VZ
123	2920002926800000000	35200006843597	6143524054	6/23/2014	SP
123	2920002926800000000	35200006858820	6143520916	6/23/2014	SP
123	2920002926800000000	35200006776180	6143521670	6/23/2014	SP
123	2920002926800000000	35200006802267	6143520510	6/23/2014	SP
123	2920002926800000000	35200406639117	6143523467	6/23/2014	SP

Desired result set:

cus	col4		col5	Sum(distinct count) order by col4
123	7/2/2014	VZ	2
123	6/29/2014	SP	2
123	6/24/2014	VZ	2
123	6/23/2014	SP	1

 

dnoeth 4628 posts Joined 11/04
08 Feb 2015

What are the rules to get the desired result?
What is "Sum(distinct count) order by col4"?

Dieter

Gannu67 3 posts Joined 02/15
08 Feb 2015

Sorry for the confusion. I need to get distinct count of col5. Need to populate the aggregrate in decremental order.

 

1. 123 7/2/2014    VZ  2 -- here two is count which is distinct count(VZ,SP,VZ,SP)

2. 123 6/29/2014   SP  2 -- here two is count which is distinct count(SP,VZ,SP)

3. 123 6/24/2014   VZ  2-- here two is count which is distinct count(VZ,SP)

4. 123 6/23/2014   SP  1-- here one is the count which is distinct count(SP only) and there are no rows below.

 

I need to calculate the count col. like mentioned above

dnoeth 4628 posts Joined 11/04
09 Feb 2015

You need to nest OLAP functions, tag the first row with a new col5 value and then do a cumulative sum:

SELECT customer, col4, col5, 
   SUM(x) 
   OVER (PARTITION BY customer
         ORDER BY col4
         ROWS UNBOUNDED PRECEDING)
FROM
 (
   SELECT customer, col4, col5,
      CASE WHEN ROW_NUMBER() OVER (PARTITION BY customer, col5 ORDER BY col4) = 1 THEN 1 ELSE 0 END AS x
   FROM tab
   GROUP BY 1,2,3
 ) AS dt

 

Dieter

Gannu67 3 posts Joined 02/15
11 Feb 2015

Thanks Dieter. 

shavyani 23 posts Joined 03/15
05 Apr 2016

Hello Folks,  
I have a requirement and should bring d data in below format
(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total
A.        Xx.     Vvv.          532                                                                             532
B.      Cc          Cv.                                           453                                            453
c.       Ft.         As.                                                                     543.                543           
From d above example the date1 and date 2 are date Columns and it's being compared and difference is found as number of days . i.e. if date1>date 2 then date1-date2 else date2-date1 which would giv d number of days difference.
 
Now based on this number of days difference calculation the SUM is calculated group by on column 1 and will come under the respective days bucket, either (Sum for days1-10.)or (Sum days11-20) or  (sumdays21-30).
 
And the last column is the SUM of that specific row.
 
Hope I made my requirement clear, now how do I handle this at the Teradata query Level, Please help.
Thanks in Advance.
Best Regards,
Shavyani :)

AtardecerR0j0 71 posts Joined 09/12
06 Apr 2016

Hi shavyani
I don't know what do you want to do exactly but I think this can help you
 

 

Be More!!

AtardecerR0j0 71 posts Joined 09/12
06 Apr 2016

Hi shavyani
I don't know what do you want to do exactly but I think this can help you
 

create multiset volatile table MyDates
(
date1 date,
date2 date
)primary index( date1 )
on commit preserve rows;

insert into MyDates
select current_date+random(-15,15), current_date+random(-15,15)
from sys_calendar.calendar
sample 100;

select
  date1,
  date2,
  case when diferencia between 1 and 10 then diferencia else 0 end as Days_1_10,
  case when diferencia between 11 and 20 then diferencia else 0 end as Days_11_20,
  case when diferencia between 21 and 30 then diferencia else 0 end as Days_21_30,
  abs(date1-date2) as diferencia
from MyDates
where abs(date1-date2) between 1 and 30

 

Be More!!

yuvaevergreen 93 posts Joined 07/09
06 Apr 2016

SEL 

CUST, DT1, DT2,

CASE   WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10,

CASE   WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20,

CASE   WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30,

CASE   WHEN DIFF <1 OR  DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF,

SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM 

(SEL   CUST, DT1, DT2,

SUM(

CASE 

WHEN DT1 > DT2 THEN DT1-DT2

WHEN DT2 > DT1 THEN DT2-DT1

ELSE 0

END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF

FROM 

(

SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A

UNION ALL

SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B

UNION ALL

SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C

UNION ALL

SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A

) A

) A

shavyani 23 posts Joined 03/15
06 Apr 2016

Thanks ton Atar and  Yuvae I shall implement this and get back.
Yuvae a small clarfication I would like to have , here is the union all clauses is being done based on the Cust value but if der are 1000 Cust values we wouldn't b able to write as many Union All, So is there an alternate to this Union All loop.
Please Advice.
Thanks in Advance :)
Best Regards,
Shavyani

dnoeth 4628 posts Joined 11/04
06 Apr 2016

The UNION is just for creating the base data, of course this will be a Select on your actual table.

Dieter

shavyani 23 posts Joined 03/15
07 Apr 2016

Exactly thought the same. Thanks ton for clarifying.

Regards,
Shavyani

shavyani 23 posts Joined 03/15
14 Apr 2016

Hello All ,
Hi All ,
In the same example as above , I will have to calculate the column wise "SUM" as mentioned in 'Bold" in the example below.
(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total
SUM                                533                          458                    543                1543
A.        Xx.     Vvv.          532                                                                             532
B.      Cc          Cv.                                           453                                            453
c.       Ft.         As.                                                                     543                  543   
D       hh          xx              1                                                                              1        
E        ff           mm                                             5                                             5         
I want this SUM to be as the first row in the entire data set , 
I think of something like below that's in bold.... will this work? or is there any other function or workaround that I could use to calculate this , Please help 
SELECT
CAST(NULL AS VARCHAR(20)) AS CUST,
CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT1,
CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT2,
SUM(SUM_1TO10) , SUM(SUM_11TO20),SUM(SUM_21TO30),SUM(SUM_TOT ) 
UNION
SEL 
CUST, DT1, DT2,
CASE   WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10,
CASE   WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20,
CASE   WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30,
CASE   WHEN DIFF <1 OR  DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF,
SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM 
(SEL   CUST, DT1, DT2,
SUM(
CASE 
WHEN DT1 > DT2 THEN DT1-DT2
WHEN DT2 > DT1 THEN DT2-DT1
ELSE 0
END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF
FROM 
(
SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B
UNION ALL
SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C
UNION ALL
SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A
) A
) A
Regards,
Shavyani

You must sign in to leave a comment.