Database Welcome to the Teradata Forums. Read the guidelines on posting. Email notifications and RSS feeds are available, and also a chronological list of all forum activity.
 Print All Forums Database barthouston 4 posts Joined 04/14 22 Apr 2014 Cumulative Sum by distinct customer ID by year Hi - I've seen some posts here regarding the SUM OVER function to calculate the cumulative sum.  My problem at hand is slightly different, where I need to keep a running total of distinct purchasers by year: Year        #Distinct Customers         #Cumulative (Distinct Customers) 2010                  150                                         150 2011                  180                                         310  <-- Note that the count is smaller than just (150+180), because some may be repeat purchasers 2012                  200                                         500  <-- same thing here as above, would like to exclude repeat purchasers 2013                  250                                         750  <-- just so happens that all 250 customers in 2013 were all new 2014                  190                                         805    Could anyone please let me know if this can somehow work with SUM OVER Partition?  Right now I have, but it only gives a simple running total.SELECT year, count(distinct cust_id) as cust_count,SUM(cust_count) OVER (PARTITION BY year ORDER BY year ROWS UNBOUNDED PRECEDING ) AS cumul_sum   Thanks! Carey dnoeth 4628 posts Joined 11/04 23 Apr 2014 Hi Carey, I've done a similar query, new vs. existing customers for every period :-) There's no way to do it with a simple SUM OVER as you can't use DISTINCT within an OLAP function. ```SELECT yr, COUNT(*) AS distinct_customers ,SUM(flag) AS new_customers ,SUM(SUM(flag)) OVER (ORDER BY yr ROWS UNBOUNDED PRECEDING) AS cumulative_new_customers FROM ( SELECT cust_id ,EXTRACT(YEAR FROM order_dt) AS yr ,CASE -- find the first year when a customer placed an order WHEN EXTRACT(YEAR FROM order_dt) = MIN(EXTRACT(YEAR FROM order_dt)) OVER (PARTITION BY cust_id) THEN 1 ELSE 0 END AS flag FROM orders GROUP BY yr,cust_id ) AS dt GROUP BY 1```  Dieter barthouston 4 posts Joined 04/14 23 Apr 2014 Thank you so much, Dieter.  This is exactly what I needed!! CarlosAL 512 posts Joined 04/08 25 Apr 2014 Hi all: Sorry for jumping in (and jumping in late) I'm not sure about requirements but doesn't a scalar subquery do the trick? SELECT a.year,        count(distinct a.cust_id) as cust_count,        (select count(distinct b.cust_id) from mytable b where b.year <= a.year) from mytable a group by a.year order by a.year;   Cheers. Carlos.     dnoeth 4628 posts Joined 11/04 25 Apr 2014 Hi Carlos, of course this should work, too. But the Explain wil be horrible due to the non-equi-Join, including a bad Product Join with a huge intermediate spool. I always try to avoid Scalar Subqueries like this, most of them are easily rewritten using OLAP functions, etc. :-) Dieter CarlosAL 512 posts Joined 04/08 28 Apr 2014 Hi Dieter: Agreed with the horrible query plan. Agreed with the scalar subqueries (I don't like them either). BUT: Sometimes the simplicity of the query can be a adequate solution for many users (Not everybody is able to develop a query like yours. "Easily rewritten" is a fuzzy concept here ;-) ). Cheers. Carlos. ymuharram 1 post Joined 04/14 28 Apr 2014 Hi - I am new to Teradata SQL and I have a problem like this. I need to get the LATEST record from each customer. To get this, I created a derived table, but it doesn't work. Could anyone has the idea please? Here is my codes: select top 100* from CUST_CST_CUAD as CUST where PERSON_ID=tempo.PERSON_ID and CUST.DOE=tempo.take order by PERSON_ID, DOE,   (select PERSON_ID ,max(DOE) as take from CUST_CST_CUAD group by PERSON_ID) as tempo;   Cheers, Yusuf       dnoeth 4628 posts Joined 11/04 28 Apr 2014 @Carlos: you're right, this specific query was not "easily rewritten", but it's also not an easy question :-)   @Yusuf: The Deried Table must be in the FROM clause: ```select top 100* from CUST_CST_CUAD as CUST, (select PERSON_ID ,max(DOE) as take from CUST_CST_CUAD group by PERSON_ID) as tempo where PERSON_ID=tempo.PERSON_ID and CUST.DOE=tempo.take order by PERSON_ID, DOE```But you can simply use an OLAP function: ```select * from CUST_CST_CUAD as CUST qualify MAX(DOE) OVER (PARTITION BY PERSON_ID) = DOE; select * from CUST_CST_CUAD as CUST qualify RANK() OVER (PARTITION BY PERSON_ID ORDER BY DOE DESC) = 1;```Both return exactly the same result, there might be more than one row per customer if there are multiole rows for the same max date (If you want only a single row, you must add another column to ORDER BY or switch to ROW_NUMBER instead of RANK) Dieter Gannu67 3 posts Joined 02/15 07 Feb 2015 I have below record set ``` Customer col1 col2 col3 col4 col5 123 2920292109800000000 352008063830068 6147439366 7/2/2014 VZ 123 2920292109800000000 352008063988684 6147438242 7/2/2014 VZ 123 2920292109800000000 352005063348786 6145379130 7/2/2014 VZ 123 2920292109800000000 352005063125267 6147435989 7/2/2014 VZ 123 2920292109800000000 352008063805045 6147437954 7/2/2014 VZ 123 2920029207100030000 35200006841935 6147437954 6/29/2014 SP 123 2920029207100030000 35200406638684 6147435989 6/29/2014 SP 123 2920029207100030000 35200406629904 6147439979 6/29/2014 SP 123 2920029207100030000 35200006817715 6147438242 6/29/2014 SP 123 2920029207100030000 35200006840965 6147439366 6/29/2014 SP 123 2920029206700020000 352006063253877 6145512232 6/24/2014 VZ 123 2920029206700020000 352006063184395 6145511878 6/24/2014 VZ 123 2920029206700020000 352008064038380 6145511808 6/24/2014 VZ 123 2920029206700020000 352006063342274 6145512162 6/24/2014 VZ 123 2920029206700020000 352006063493580 6145510997 6/24/2014 VZ 123 2920002926800000000 35200006843597 6143524054 6/23/2014 SP 123 2920002926800000000 35200006858820 6143520916 6/23/2014 SP 123 2920002926800000000 35200006776180 6143521670 6/23/2014 SP 123 2920002926800000000 35200006802267 6143520510 6/23/2014 SP 123 2920002926800000000 35200406639117 6143523467 6/23/2014 SP```Desired result set: ```cus col4 col5 Sum(distinct count) order by col4 123 7/2/2014 VZ 2 123 6/29/2014 SP 2 123 6/24/2014 VZ 2 123 6/23/2014 SP 1```  dnoeth 4628 posts Joined 11/04 08 Feb 2015 What are the rules to get the desired result? What is "Sum(distinct count) order by col4"? Dieter Gannu67 3 posts Joined 02/15 08 Feb 2015 Sorry for the confusion. I need to get distinct count of col5. Need to populate the aggregrate in decremental order.   `1. 123 7/2/2014    VZ  2 -- here two is count which is distinct count(VZ,SP,VZ,SP) ` `2. 123 6/29/2014   SP  2 `-- here two is count which is distinct count(SP,VZ,SP) `3. 123 6/24/2014   VZ  2`-- here two is count which is distinct count(VZ,SP) `4. 123 6/23/2014   SP  1`-- here one is the count which is distinct count(SP only) and there are no rows below.   I need to calculate the count col. like mentioned above dnoeth 4628 posts Joined 11/04 09 Feb 2015 You need to nest OLAP functions, tag the first row with a new col5 value and then do a cumulative sum: ```SELECT customer, col4, col5, SUM(x) OVER (PARTITION BY customer ORDER BY col4 ROWS UNBOUNDED PRECEDING) FROM ( SELECT customer, col4, col5, CASE WHEN ROW_NUMBER() OVER (PARTITION BY customer, col5 ORDER BY col4) = 1 THEN 1 ELSE 0 END AS x FROM tab GROUP BY 1,2,3 ) AS dt ```  Dieter Gannu67 3 posts Joined 02/15 11 Feb 2015 Thanks Dieter.  shavyani 23 posts Joined 03/15 05 Apr 2016 Hello Folks,   I have a requirement and should bring d data in below format (col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total A.        Xx.     Vvv.          532                                                                             532 B.      Cc          Cv.                                           453                                            453 c.       Ft.         As.                                                                     543.                543            From d above example the date1 and date 2 are date Columns and it's being compared and difference is found as number of days . i.e. if date1>date 2 then date1-date2 else date2-date1 which would giv d number of days difference.   Now based on this number of days difference calculation the SUM is calculated group by on column 1 and will come under the respective days bucket, either (Sum for days1-10.)or (Sum days11-20) or  (sumdays21-30).   And the last column is the SUM of that specific row.   Hope I made my requirement clear, now how do I handle this at the Teradata query Level, Please help. Thanks in Advance. Best Regards, Shavyani :) AtardecerR0j0 71 posts Joined 09/12 06 Apr 2016 Hi shavyani I don't know what do you want to do exactly but I think this can help you     Be More!! AtardecerR0j0 71 posts Joined 09/12 06 Apr 2016 Hi shavyani I don't know what do you want to do exactly but I think this can help you   ```create multiset volatile table MyDates ( date1 date, date2 date )primary index( date1 ) on commit preserve rows; insert into MyDates select current_date+random(-15,15), current_date+random(-15,15) from sys_calendar.calendar sample 100; select   date1,   date2,   case when diferencia between 1 and 10 then diferencia else 0 end as Days_1_10,   case when diferencia between 11 and 20 then diferencia else 0 end as Days_11_20,   case when diferencia between 21 and 30 then diferencia else 0 end as Days_21_30,   abs(date1-date2) as diferencia from MyDates where abs(date1-date2) between 1 and 30 ```  Be More!! yuvaevergreen 93 posts Joined 07/09 06 Apr 2016 SEL  CUST, DT1, DT2, CASE   WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10, CASE   WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20, CASE   WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30, CASE   WHEN DIFF <1 OR  DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF, SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM  (SEL   CUST, DT1, DT2, SUM( CASE  WHEN DT1 > DT2 THEN DT1-DT2 WHEN DT2 > DT1 THEN DT2-DT1 ELSE 0 END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF FROM  ( SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A UNION ALL SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B UNION ALL SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C UNION ALL SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A ) A ) A shavyani 23 posts Joined 03/15 06 Apr 2016 Thanks ton Atar and  Yuvae I shall implement this and get back. Yuvae a small clarfication I would like to have , here is the union all clauses is being done based on the Cust value but if der are 1000 Cust values we wouldn't b able to write as many Union All, So is there an alternate to this Union All loop. Please Advice. Thanks in Advance :) Best Regards, Shavyani dnoeth 4628 posts Joined 11/04 06 Apr 2016 The UNION is just for creating the base data, of course this will be a Select on your actual table. Dieter shavyani 23 posts Joined 03/15 07 Apr 2016 Exactly thought the same. Thanks ton for clarifying. Regards, Shavyani shavyani 23 posts Joined 03/15 14 Apr 2016 Hello All , Hi All , In the same example as above , I will have to calculate the column wise "SUM" as mentioned in 'Bold" in the example below. (col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    TotalSUM                                533                          458                    543                1543 A.        Xx.     Vvv.          532                                                                             532 B.      Cc          Cv.                                           453                                            453 c.       Ft.         As.                                                                     543                  543    D       hh          xx              1                                                                              1         E        ff           mm                                             5                                             5         I want this SUM to be as the first row in the entire data set , I think of something like below that's in bold.... will this work? or is there any other function or workaround that I could use to calculate this , Please help SELECTCAST(NULL AS VARCHAR(20)) AS CUST,CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT1,CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT2,SUM(SUM_1TO10) , SUM(SUM_11TO20),SUM(SUM_21TO30),SUM(SUM_TOT ) UNION SEL  CUST, DT1, DT2, CASE   WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10, CASE   WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20, CASE   WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30, CASE   WHEN DIFF <1 OR  DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF, SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM  (SEL   CUST, DT1, DT2, SUM( CASE  WHEN DT1 > DT2 THEN DT1-DT2 WHEN DT2 > DT1 THEN DT2-DT1 ELSE 0 END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF FROM  ( SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A UNION ALL SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B UNION ALL SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C UNION ALL SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A ) A ) A Regards, Shavyani You must sign in to leave a comment. Active Posters