Hi Carey,
I've done a similar query, new vs. existing customers for every period :-)
There's no way to do it with a simple SUM OVER as you can't use DISTINCT within an OLAP function.
SELECT yr, COUNT(*) AS distinct_customers ,SUM(flag) AS new_customers ,SUM(SUM(flag)) OVER (ORDER BY yr ROWS UNBOUNDED PRECEDING) AS cumulative_new_customers FROM ( SELECT cust_id ,EXTRACT(YEAR FROM order_dt) AS yr ,CASE -- find the first year when a customer placed an order WHEN EXTRACT(YEAR FROM order_dt) = MIN(EXTRACT(YEAR FROM order_dt)) OVER (PARTITION BY cust_id) THEN 1 ELSE 0 END AS flag FROM orders GROUP BY yr,cust_id ) AS dt GROUP BY 1
Dieter
Thank you so much, Dieter. This is exactly what I needed!!
Hi all:
Sorry for jumping in (and jumping in late)
I'm not sure about requirements but doesn't a scalar subquery do the trick?
SELECT a.year,
count(distinct a.cust_id) as cust_count,
(select count(distinct b.cust_id) from mytable b where b.year <= a.year)
from mytable a
group by a.year
order by a.year;
Cheers.
Carlos.
Hi Carlos,
of course this should work, too.
But the Explain wil be horrible due to the non-equi-Join, including a bad Product Join with a huge intermediate spool. I always try to avoid Scalar Subqueries like this, most of them are easily rewritten using OLAP functions, etc. :-)
Dieter
Hi Dieter:
Agreed with the horrible query plan. Agreed with the scalar subqueries (I don't like them either).
BUT:
Sometimes the simplicity of the query can be a adequate solution for many users (Not everybody is able to develop a query like yours. "Easily rewritten" is a fuzzy concept here ;-) ).
Cheers.
Carlos.
Hi - I am new to Teradata SQL and I have a problem like this. I need to get the LATEST record from each customer.
To get this, I created a derived table, but it doesn't work. Could anyone has the idea please?
Here is my codes:
select top 100*
from CUST_CST_CUAD as CUST
where PERSON_ID=tempo.PERSON_ID and
CUST.DOE=tempo.take
order by PERSON_ID, DOE,
(select
PERSON_ID
,max(DOE) as take
from CUST_CST_CUAD
group by PERSON_ID) as tempo;
Cheers,
Yusuf
@Carlos:
you're right, this specific query was not "easily rewritten", but it's also not an easy question :-)
@Yusuf:
The Deried Table must be in the FROM clause:
select top 100* from CUST_CST_CUAD as CUST, (select PERSON_ID ,max(DOE) as take from CUST_CST_CUAD group by PERSON_ID) as tempo where PERSON_ID=tempo.PERSON_ID and CUST.DOE=tempo.take order by PERSON_ID, DOE
But you can simply use an OLAP function:
select * from CUST_CST_CUAD as CUST qualify MAX(DOE) OVER (PARTITION BY PERSON_ID) = DOE; select * from CUST_CST_CUAD as CUST qualify RANK() OVER (PARTITION BY PERSON_ID ORDER BY DOE DESC) = 1;
Both return exactly the same result, there might be more than one row per customer if there are multiole rows for the same max date (If you want only a single row, you must add another column to ORDER BY or switch to ROW_NUMBER instead of RANK)
Dieter
I have below record set
Customer col1 col2 col3 col4 col5 123 2920292109800000000 352008063830068 6147439366 7/2/2014 VZ 123 2920292109800000000 352008063988684 6147438242 7/2/2014 VZ 123 2920292109800000000 352005063348786 6145379130 7/2/2014 VZ 123 2920292109800000000 352005063125267 6147435989 7/2/2014 VZ 123 2920292109800000000 352008063805045 6147437954 7/2/2014 VZ 123 2920029207100030000 35200006841935 6147437954 6/29/2014 SP 123 2920029207100030000 35200406638684 6147435989 6/29/2014 SP 123 2920029207100030000 35200406629904 6147439979 6/29/2014 SP 123 2920029207100030000 35200006817715 6147438242 6/29/2014 SP 123 2920029207100030000 35200006840965 6147439366 6/29/2014 SP 123 2920029206700020000 352006063253877 6145512232 6/24/2014 VZ 123 2920029206700020000 352006063184395 6145511878 6/24/2014 VZ 123 2920029206700020000 352008064038380 6145511808 6/24/2014 VZ 123 2920029206700020000 352006063342274 6145512162 6/24/2014 VZ 123 2920029206700020000 352006063493580 6145510997 6/24/2014 VZ 123 2920002926800000000 35200006843597 6143524054 6/23/2014 SP 123 2920002926800000000 35200006858820 6143520916 6/23/2014 SP 123 2920002926800000000 35200006776180 6143521670 6/23/2014 SP 123 2920002926800000000 35200006802267 6143520510 6/23/2014 SP 123 2920002926800000000 35200406639117 6143523467 6/23/2014 SP
Desired result set:
cus col4 col5 Sum(distinct count) order by col4 123 7/2/2014 VZ 2 123 6/29/2014 SP 2 123 6/24/2014 VZ 2 123 6/23/2014 SP 1
What are the rules to get the desired result?
What is "Sum(distinct count) order by col4"?
Dieter
Sorry for the confusion. I need to get distinct count of col5. Need to populate the aggregrate in decremental order.
1. 123 7/2/2014 VZ 2 -- here two is count which is distinct count(VZ,SP,VZ,SP)
2. 123 6/29/2014 SP 2
-- here two is count which is distinct count(SP,VZ,SP)
3. 123 6/24/2014 VZ 2
-- here two is count which is distinct count(VZ,SP)
4. 123 6/23/2014 SP 1
-- here one is the count which is distinct count(SP only) and there are no rows below.
I need to calculate the count col. like mentioned above
You need to nest OLAP functions, tag the first row with a new col5 value and then do a cumulative sum:
SELECT customer, col4, col5, SUM(x) OVER (PARTITION BY customer ORDER BY col4 ROWS UNBOUNDED PRECEDING) FROM ( SELECT customer, col4, col5, CASE WHEN ROW_NUMBER() OVER (PARTITION BY customer, col5 ORDER BY col4) = 1 THEN 1 ELSE 0 END AS x FROM tab GROUP BY 1,2,3 ) AS dt
Dieter
Thanks Dieter.
Hello Folks,
I have a requirement and should bring d data in below format
(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30) Total
A. Xx. Vvv. 532 532
B. Cc Cv. 453 453
c. Ft. As. 543. 543
From d above example the date1 and date 2 are date Columns and it's being compared and difference is found as number of days . i.e. if date1>date 2 then date1-date2 else date2-date1 which would giv d number of days difference.
Now based on this number of days difference calculation the SUM is calculated group by on column 1 and will come under the respective days bucket, either (Sum for days1-10.)or (Sum days11-20) or (sumdays21-30).
And the last column is the SUM of that specific row.
Hope I made my requirement clear, now how do I handle this at the Teradata query Level, Please help.
Thanks in Advance.
Best Regards,
Shavyani :)
Hi shavyani
I don't know what do you want to do exactly but I think this can help you
Be More!!
Hi shavyani
I don't know what do you want to do exactly but I think this can help you
create multiset volatile table MyDates ( date1 date, date2 date )primary index( date1 ) on commit preserve rows; insert into MyDates select current_date+random(-15,15), current_date+random(-15,15) from sys_calendar.calendar sample 100; select date1, date2, case when diferencia between 1 and 10 then diferencia else 0 end as Days_1_10, case when diferencia between 11 and 20 then diferencia else 0 end as Days_11_20, case when diferencia between 21 and 30 then diferencia else 0 end as Days_21_30, abs(date1-date2) as diferencia from MyDates where abs(date1-date2) between 1 and 30
Be More!!
SEL
CUST, DT1, DT2,
CASE WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10,
CASE WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20,
CASE WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30,
CASE WHEN DIFF <1 OR DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF,
SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM
(SEL CUST, DT1, DT2,
SUM(
CASE
WHEN DT1 > DT2 THEN DT1-DT2
WHEN DT2 > DT1 THEN DT2-DT1
ELSE 0
END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF
FROM
(
SEL * FROM (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B
UNION ALL
SEL * FROM (SEL 'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C
UNION ALL
SEL * FROM (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A
) A
) A
Thanks ton Atar and Yuvae I shall implement this and get back.
Yuvae a small clarfication I would like to have , here is the union all clauses is being done based on the Cust value but if der are 1000 Cust values we wouldn't b able to write as many Union All, So is there an alternate to this Union All loop.
Please Advice.
Thanks in Advance :)
Best Regards,
Shavyani
The UNION is just for creating the base data, of course this will be a Select on your actual table.
Dieter
Exactly thought the same. Thanks ton for clarifying.
Regards,
Shavyani
Hello All ,
Hi All ,
In the same example as above , I will have to calculate the column wise "SUM" as mentioned in 'Bold" in the example below.
(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30) Total
SUM 533 458 543 1543
A. Xx. Vvv. 532 532
B. Cc Cv. 453 453
c. Ft. As. 543 543
D hh xx 1 1
E ff mm 5 5
I want this SUM to be as the first row in the entire data set ,
I think of something like below that's in bold.... will this work? or is there any other function or workaround that I could use to calculate this , Please help
SELECT
CAST(NULL AS VARCHAR(20)) AS CUST,
CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT1,
CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT2,
SUM(SUM_1TO10) , SUM(SUM_11TO20),SUM(SUM_21TO30),SUM(SUM_TOT )
UNION
SEL
CUST, DT1, DT2,
CASE WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10,
CASE WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20,
CASE WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30,
CASE WHEN DIFF <1 OR DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF,
SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM
(SEL CUST, DT1, DT2,
SUM(
CASE
WHEN DT1 > DT2 THEN DT1-DT2
WHEN DT2 > DT1 THEN DT2-DT1
ELSE 0
END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF
FROM
(
SEL * FROM (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B
UNION ALL
SEL * FROM (SEL 'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C
UNION ALL
SEL * FROM (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A
) A
) A
Regards,
Shavyani
Hi - I've seen some posts here regarding the SUM OVER function to calculate the cumulative sum. My problem at hand is slightly different, where I need to keep a running total of distinct purchasers by year:
Year #Distinct Customers #Cumulative (Distinct Customers)
2010 150 150
2011 180 310 <-- Note that the count is smaller than just (150+180), because some may be repeat purchasers
2012 200 500 <-- same thing here as above, would like to exclude repeat purchasers
2013 250 750 <-- just so happens that all 250 customers in 2013 were all new
2014 190 805
Could anyone please let me know if this can somehow work with SUM OVER Partition? Right now I have, but it only gives a simple running total.
SELECT year, count(distinct cust_id) as cust_count,
SUM(cust_count) OVER (PARTITION BY year ORDER BY year ROWS UNBOUNDED PRECEDING ) AS cumul_sum
Thanks!
Carey