All Forums Database
16 Jul 2014
Group by on a Group by not working as expected!!!

I run into a problem with 2 ”group by” in one query, it is not working as I expected it to do. Only when I force the inner result into a separate table it is doing what I expected it to do. Why are the below 3 queryes not giving the same result?

DROP TABLE T1;
CREATE MULTISET VOLATILE TABLE T1 (D DATE, V INTEGER) ON COMMIT PRESERVE ROWS;
DELETE FROM T1;
INSERT INTO T1 VALUES ('2013-05-06', 123);
INSERT INTO T1 VALUES ('2013-05-07', 456);
INSERT INTO T1 VALUES ('2014-05-01', 789);
INSERT INTO T1 VALUES ('2014-06-23', 012);

------------------------------------------------------ --------
-- UPS! Not Working
------------------------------------------------------ --------
SELECT Y, SUM(V) AS V
  FROM (
        SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
          FROM T1
         GROUP BY D
       ) AS x
 GROUP BY Y;

------------------------------------------------------ --------
-- UPS! Not Working
------------------------------------------------------ --------
WITH T2 (Y, V)
AS
(
SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
  FROM T1
 GROUP BY D
)

SELECT Y, SUM(V) AS V
  FROM T2 AS b
 GROUP BY Y;

------------------------------------------------------ --------
-- YEA! Working!!!
------------------------------------------------------ ---------
CREATE MULTISET VOLATILE TABLE T3
AS
(
SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
  FROM T1
 GROUP BY D
) WITH DATA ON COMMIT PRESERVE ROWS;

SELECT Y, SUM(V) AS V
  FROM T3 AS b
 GROUP BY Y;

Peter Schwennesen

ulrich 816 posts Joined 09/09
17 Jul 2014

Can you explain why you think you need the two group by?
What is the difference to 

SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
FROM T1
GROUP BY Y

?

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

ToddAWalter 316 posts Joined 10/11
17 Jul 2014

In addition to agreeing with Ulrich, I also note that you will often get confusing results when you use an alias with the same name as a column which is in the scope of the query - in this case, the SUM(V) AS V. A later reference will be resolved to the underlying column rather than the alias since that takes precendence when this ambiguity arises.

17 Jul 2014

Hi
The task I want to do is, find the total run time of several processes each day: the first SUM.
Next I want to find the average run time each year: AVG (I see that I have SUM in the outer group, this is an error! But it do not make the SQL Code invalid)
Therefore first I sum over each day, and get the total run time each day, and then I extract the year from the date and perform an average over each day run time for each year.
I know that this may be performed in other ways, and maybe with better performance. But for me it was easy to understand to first cum over a DATE, and then extract the YEAR from the date and “SUM” (actually AVG) over the year, but when trying grouping over the YEAR I got the same year value back where I had expected only one row for each YEAR.
The last example where I place the date sum in a table and then use this table to sum over the year give me the result I have expected. My problem here is to understand why the two preceding Queries doses not produce the same result.
Me and my college tried using a CAST(SUBSTR(DATE, 1,4) as CHAR(4)), to change the DATE to a Year value, and doing so produced the correct result!
Peter

17 Jul 2014

Comment, this is just a test script, not something in production. Therefore the short names of V, D, Y.
Peter

17 Jul 2014

Maybe it something with the EXTRACT function??
This WORKS:

SELECT Y, AVG(V) AS V
  FROM (
        SELECT SUBSTR(D,1,4) AS Y, SUM(V) AS V
          FROM T1
         GROUP BY D
       ) AS x
 GROUP BY Y;

But this DOES NOT:

SELECT Y, AVG(V) AS V
  FROM (
        SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
          FROM T1
         GROUP BY D
       ) AS x
 GROUP BY Y;

Peter

dnoeth 4628 posts Joined 11/04
17 Jul 2014

Hi Peter,
what's the error you get, a 3504?
I just tried it and all SELECTs worked  on my system :-)
So the optmizer might be more clever in a newer release and notices the dependency between d and EXTRACT(YEAR FROM d)
The logic (group by the date, but return only the year) is a bit complicated and you need to think twice before you understand it. You could rewrite it:

SELECT Y, AVG(V) AS V
  FROM (
        SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
          FROM T1
         GROUP BY d,y
       ) AS x
 GROUP BY Y;

or 

SELECT EXTRACT(YEAR FROM D), AVG(V) AS V
  FROM (
        SELECT d, SUM(V) AS V
          FROM T1
         GROUP BY d
       ) AS x
 GROUP BY 1;
 

 

Dieter

22 Jul 2014

Hi Dieter
I do not get any errors. The queries just return a result set. I expected all 3 examples to return only 2 rows with the average number of sum pr day.
What I do not understand is that when the inner SELECT produce a result that should result in averaging to only 2 lines, this is not the case when the select is inside another select, but when I use a middle table to store it in before performing the outer SELECT it is working as I expect.
We are on Teradata 13.10 now; I expect that we will move to 14.xx in a short time.
Maybe this issue are related to cases where I have seen the using a row of temp-tables makes a query run much faster than putting it all together in one big SELECT statement?
When running this example on my system I get the following results (with no errors!):
Peter
 

DROP TABLE T1;
CREATE MULTISET VOLATILE TABLE T1 (D DATE, V INTEGER) ON COMMIT PRESERVE ROWS;
DELETE FROM T1;
INSERT INTO T1 VALUES ('2013-05-06', 5);
INSERT INTO T1 VALUES ('2013-05-06', 10);
INSERT INTO T1 VALUES ('2013-05-07', 10);
INSERT INTO T1 VALUES ('2013-05-07', 15);
INSERT INTO T1 VALUES ('2014-05-01', 1);
INSERT INTO T1 VALUES ('2014-05-01', 2);
INSERT INTO T1 VALUES ('2014-06-23', 2);
INSERT INTO T1 VALUES ('2014-06-23', 4);
------------------------------------------------------ --------
-- UPS! Not Working
------------------------------------------------------ --------
SELECT Y, AVG(V) AS V
  FROM (
        SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
          FROM T1
         GROUP BY D
       ) AS x
 GROUP BY Y;
	Y	V
1	2.014	6,00
2	2.013	15,00
3	2.014	3,00
4	2.013	25,00
------------------------------------------------------ --------
-- UPS! Not Working
------------------------------------------------------ --------
WITH T2 (Y, V)
AS
(
SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
  FROM T1
 GROUP BY D
)
 
SELECT Y, AVG(V) AS V
  FROM T2 AS b
 GROUP BY Y;
	Y	V
1	2.014	6,00
2	2.013	15,00
3	2.014	3,00
4	2.013	25,00
------------------------------------------------------ --------
-- YEA! Working!!!
------------------------------------------------------ ---------
DROP TABLE T3;
CREATE MULTISET VOLATILE TABLE T3
AS
(
SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
  FROM T1
 GROUP BY D
) WITH DATA ON COMMIT PRESERVE ROWS;
 
SELECT Y, AVG(V) AS V
  FROM T3 AS b
 GROUP BY Y;
	Y	V
1	2.013	20,00
2	2.014	4,50
------------------------------------------------------ --------
-- Dieter eksample 1 not working
------------------------------------------------------ ---------
SELECT Y, AVG(V) AS V
  FROM (
        SELECT EXTRACT(YEAR FROM D) AS Y, SUM(V) AS V
          FROM T1
         GROUP BY d, y
       ) AS x
 GROUP BY Y;
	Y	V
1	2.014	6,00
2	2.013	15,00
3	2.014	3,00
4	2.013	25,00
------------------------------------------------------ --------
-- YEA! Dieter eksample 2 working
------------------------------------------------------ ---------
SELECT EXTRACT(YEAR FROM D) AS Y, AVG(V) AS V
  FROM (
        SELECT d, SUM(V) AS V
          FROM T1
         GROUP BY d
       ) AS x
 GROUP BY 1;
	Y	V
1	2.013	20,00
2	2.014	4,50

 

dnoeth 4628 posts Joined 11/04
22 Jul 2014

Hi Peter,
"query is not working" was a bit vague, now it's clear :-)
If there's no error, but a wrong result set, it's a bug. You should contact your DBA immediately to submit an incident to Teradata support.

Dieter

23 Jul 2014

Hi Dieder
OK.
But do you think this is a bug? Should all above give the same result?
I was afraid that I made some wrong assumptions in the code?
Peter

dnoeth 4628 posts Joined 11/04
23 Jul 2014

Hi Peter, 
of course it's a bug, all queries return the same (correct) result on a 15.00.00.03 :-) 

Dieter

23 Jul 2014

Hi Dieter
We are going to 14.xx in near future. I tried logging into the new machine and here all queries returned the same correct result. It must be a bug in 13.10 then. I do not think my DBA will use the time to submit any incidents, because the 13.10 will go out of production in near future.
I wonder how many out there are still on 13.10 and do not realize that there is an issue here!
Peter

You must sign in to leave a comment.