All Forums Database
gwenaellebarzic 7 posts Joined 05/13
20 Aug 2013
COUNT () OVER (PARTITION BY) and DISTINCT

Hello !
I create this topic, because I have a question concerning the COUNT(My field) OVER (PARTITION BY My Other Field) and the DISTINCT statement.
Here is the situation : I have in one table the following lines :
Name;Hair Colour;Timestamp
Ted;brown;t1
Ted;red;t2
Ted;brown;t3
Ted;blue;t4
Ben;blond;t5
Ben;green;t6
Ben;purple;t7
Ben;green;t8
 
I would like to obtain the following result :
Name;colour;Total Nb of distinct hair colours per name
Ted;brown;3
Ted;red;3
Ted;blue;3
Ben;blond;3
Ben;green;3
Ben;purple;3
 
I tried this :
SELECT
    name
    , hair_colour
    , COUNT(hair_colour) OVER (PARTITION BY name)
FROM
    MyTable;
 
But it gives me the following result :
Name;colour;Total Nb of distinct hair colours per name
Ted;brown;4
Ted;red;4
Ted;blue;4
Ben;blond;4
Ben;green;4
Ben;purple;4
 
I think maybe it could be a good idea to use of DISTINCT in the count over, but unfortunately, this is not permitted.
I'm new in Teradata, so I just hope this question does not look completely stupid :D.
 
Best regards.
 
Gwen

dnoeth 4628 posts Joined 11/04
20 Aug 2013

Hi Gwen,
SELECT DISTINCT is processed after the OPAP function and COUNT(DISTINCT)  can't be used in OLAP, but in this case you don't need it, just GROUP BY first:

SELECT
    name
    , hair_colour
    , COUNT(hair_colour) OVER (PARTITION BY name)
FROM
    MyTable
GROUP BY 
    name
    , hair_colour;

Dieter

Dieter

clakshminarasu 3 posts Joined 05/15
22 Mar 2016

Hi Dieter, 
Just curious to know - if this is possible:
I am in the same situation where I need to use DISTINCT in a OLAP function - but being not able to.

COL_1	COL_2	COL_3	COL_4	COL_5	COL_6	COL_7	COL_8	COL_9	COL_10	COL_11	COL_12
A	B1	C1	D1	E1	F1	G1	H1	I1	J1	2	DATE1
A	B1	C1	D2	E2	F2	G2	H2	I1	J1	2	DATE1
A	B1	C1	D3	E3	F3	G3	H3	I1	J1	2	DATE1
A	B1	C1	D4	E4	F4	G4	H4	I1	J1	2	DATE1
A	B1	C1	D5	E5	F5	G5	H5	I1	J1	2	DATE1
A	B1	C1	D6	E6	F6	G6	H6	I1	J1	2	DATE1
A	B1	C1	D7	E7	F7	G7	H7	I1	J1	2	DATE1
A	B1	C1	D8	E8	F8	G8	H8	I1	J1	2	DATE1
A	B2	C2	D9	E9	F9	G9	H9	I2	J1	2	DATE1
A	B2	C2	D10	E10	F10	G10	H10	I2	J1	2	DATE1
A	B2	C2	D11	E11	F11	G11	H11	I2	J1	2	DATE1
A	B2	C2	D12	E12	F12	G12	H12	I2	J1	2	DATE1
A	B2	C2	D13	E13	F13	G13	H13	I2	J1	2	DATE1
A	B2	C2	D14	E14	F14	G14	H14	I2	J1	2	DATE1
A	B2	C2	D15	E15	F15	G15	H15	I2	J1	2	DATE1
A	B2	C2	D16	E16	F16	G16	H16	I2	J1	2	DATE1

I need the COL_11 count as distinct count of COL_2. I am supposed to use the partition by for COL_1 and COL_12.

SELECT COL_1, COL_2, COL_3, COL_4, COL_5, COL_6, COL_7, COL_8, COL_9, COL_1,
COUNT( DISTINCT COL_2) over(partition by COL_1, COL_12 ) as NO_COL_2_VAL,
COL_12
FROM TBL

This  query fails - and I have to find an alternative way to acieve this.
Can anybody please help?
 
Thanks
Lakshminarasu Chenduri

dnoeth 4628 posts Joined 11/04
23 Mar 2016

Hi Lakshminarasu,
as DISTINCT is not allowed you need two nested OLAP functions:

SELECT ...
   SUM(flag) 
   OVER (PARTITION BY COL_1, COL_12)
FROM
 (
   SELECT ....
      -- tag only one of the duplicate values, the first in a group
      -- could be ROW_NUMBER, too, but this needs ORDER BY
      CASE WHEN MIN(COL_2) 
                OVER (PARTITION BY COL_1, COL_12, COL_2 -- includes the DISTINCT col
                      ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) IS NULL
           THEN 1 
           ELSE 0 
      END AS flag
   FROM TBL
 ) AS dt

 

Dieter

clakshminarasu 3 posts Joined 05/15
26 Mar 2016

Thank you very much Dieter !!!
Lakshminarasu

shavyani 23 posts Joined 03/15
05 Apr 2016

Hello Folks,  
I have a requirement and should bring d data in below format
(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total
A.        Xx.     Vvv.          532                                                                             532
B.      Cc          Cv.                                           453                                            453
c.       Ft.         As.                                                                     543.                543           
From d above example the date1 and date 2 are date Columns and it's being compared and difference is found as number of days . i.e. if date1>date 2 then date1-date2 else date2-date1 which would giv d number of days difference.
 
Now based on this number of days difference calculation the SUM is calculated group by on column 1 and will come under the respective days bucket, either (Sum for days1-10.)or (Sum days11-20) or  (sumdays21-30).
 
And the last column is the SUM of that specific row.
 
Hope I made my requirement clear, now how do I handle this at the Teradata query Level, Please help.
Thanks in Advance.
Best Regards,
Shavyani :)

yuvaevergreen 93 posts Joined 07/09
06 Apr 2016

Same reply as the other one...

shavyani 23 posts Joined 03/15
14 Apr 2016

Hi All ,
In the same example as above , I will have to calculate the column wise "SUM" as mentioned in 'Bold" in the example below.
(col1) (Date1) (Date2) (Sum for days1-10.) (Sum days11-20) (sumdays21-30)    Total
SUM                                533                          458                    543                1543
A.        Xx.     Vvv.          532                                                                             532
B.      Cc          Cv.                                           453                                            453
c.       Ft.         As.                                                                     543                  543   
D       hh          xx              1                                                                              1        
E        ff           mm                                             5                                             5         
I want this SUM to be as the first row in the entire data set , 
I think of something like below that's in bold.... will this work? or is there any other function or workaround that I could use to calculate this , Please help 
SELECT
CAST(NULL AS VARCHAR(20)) AS CUST,
CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT1,
CAST(NULL AS DATE FORMAT 'DD-MMM'YYYY') AS DT2,
SUM(SUM_1TO10) , SUM(SUM_11TO20),SUM(SUM_21TO30),SUM(SUM_TOT ) 
UNION
SEL 
CUST, DT1, DT2,
CASE   WHEN DIFF >=1 AND DIFF <=10 THEN DIFF ELSE 0 END AS SUM_1TO10,
CASE   WHEN DIFF >=11 AND DIFF <=20 THEN DIFF ELSE 0 END AS SUM_11TO20,
CASE   WHEN DIFF >=21 AND DIFF <=30 THEN DIFF ELSE 0 END AS SUM_21TO30,
CASE   WHEN DIFF <1 OR  DIFF >30 THEN DIFF ELSE 0 END AS SUM_DEF,
SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT FROM 
(SEL   CUST, DT1, DT2,
SUM(
CASE 
WHEN DT1 > DT2 THEN DT1-DT2
WHEN DT2 > DT1 THEN DT2-DT1
ELSE 0
END) OVER (PARTITION BY CUST ORDER BY 1 ) AS DIFF
FROM 
(
SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B
UNION ALL
SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C
UNION ALL
SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A
) A
) A
Regards,
Shavyani

AtardecerR0j0 71 posts Joined 09/12
15 Apr 2016

Hi shavyani
I'm not sure if I understand you. Is something like this what you need?


select cust, dt1, dt2, abs(DT1-DT2) as diff,
       case when diff <= 10                then diff else 0 end AS SUM_1TO10,
       case when diff >  10 and diff <= 20 then diff else 0 end AS SUM_11TO20,
       case when diff >  20 and diff <= 30 then diff else 0 end AS SUM_21TO30,
       case when diff >  30                then diff else 0 end AS SUM_DEF,
       SUM_1TO10+SUM_11TO20+SUM_21TO30+SUM_DEF AS SUM_TOT,
       sum(SUM_1TO10) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_1TO10_WF,
       sum(SUM_11TO20) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_11TO20_WF,
       sum(SUM_21TO30) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_21TO30_WF,
       sum(SUM_DEF) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_DEF_WF,
       sum(SUM_TOT) over (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SUM_TOT_WF
FROM 
(
SEL * FROM   (SEL 'A' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM   (SEL 'B' AS CUST, DATE '2001-01-21' AS DT1, DATE '2001-01-01' AS DT2) B
UNION ALL
SEL * FROM   (SEL   'C' AS CUST, DATE '2001-01-01' AS DT1, DATE '2001-01-31' AS DT2) C
UNION ALL
SEL * FROM   (SEL 'A' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-11' AS DT2) A
UNION ALL
SEL * FROM   (SEL 'E' AS CUST, DATE '2001-02-21' AS DT1, DATE '2001-01-31' AS DT2) A
) A

The you get this screenshot:

 

Be More!!

yuvaevergreen 93 posts Joined 07/09
21 Apr 2016

That should work...shavyani....

thebizyman 1 post Joined 07/16
27 Jul 2016

I see alot of these posts are similar to what I need yet a bit different.  I'm trying to get a distinct PE_ID_ID and count the number of times date_created with the same hour, in this case either 1PM, 2PM, 3PM or 4PM:
Looking to convert this:

PE_ID_id                   date_created

0002070001020000 2016-07-22 13:29:03.6149950

0002070001020000 2016-07-22 13:33:17.9413390

0002070001020000 2016-07-22 14:14:08.7322900

0002070001020000 2016-07-22 14:40:41.0482080

0002070001020000 2016-07-22 14:52:33.3103760

0002070001020000 2016-07-22 15:04:47.1281310

0002070001020000 2016-07-22 15:17:41.0911360

0002070001020000 2016-07-22 15:35:40.8175370

0002070001020000 2016-07-22 15:40:16.7062010

0002070001020000 2016-07-22 15:50:16.5956700

0002070001020000 2016-07-22 16:07:57.7761330

0002070001040000 2016-07-22 16:08:36.3324330

0002070001040000 2016-07-22 15:39:49.0625400

0002070001040000 2016-07-22 15:51:02.1318350

0002070001040000 2016-07-22 15:16:19.0790620

0002070001040000 2016-07-22 15:29:05.9620040

0002070001040000 2016-07-22 14:51:07.1357490

0002070001040000 2016-07-22 15:01:34.5495330

0002070001040000 2016-07-22 14:41:27.5670580

0002070001040000 2016-07-22 14:33:56.5452800

0002070001040000 2016-07-22 14:20:47.4194970

0002070001040000 2016-07-22 14:00:40.4712860

0002070001040000 2016-07-22 13:41:27.1271900

0002070001040000 2016-07-22 13:20:26.3539930

0002070001040000 2016-07-22 13:24:10.4667830

0002070001060000 2016-07-22 13:24:29.1956960

0002070001060000 2016-07-22 13:28:18.7786330

0002070001060000 2016-07-22 13:53:45.7635220

0002070001060000 2016-07-22 14:13:51.6734760

0002070001060000 2016-07-22 14:23:03.6995150

0002070001060000 2016-07-22 14:36:15.3188540

0002070001060000 2016-07-22 14:48:14.1417780

0002070001060000 2016-07-22 15:00:05.5083840

0002070001060000 2016-07-22 15:23:46.4888220

0002070001060000 2016-07-22 15:10:56.0630850

0002070001060000 2016-07-22 15:35:58.8465860

0002070001060000 2016-07-22 16:07:40.8866710

0002070001060000 2016-07-22 15:45:38.4578730

 

to this

PE_ID_ID                  1-2   2-3   3-4   4-5

0002070001020000 2       3      5       1

0002070001040000 3       5      5       1

0002070001060000 3       4      5       1

You must sign in to leave a comment.