All Forums Database
AnneTemp 2 posts Joined 02/14
02 Feb 2014
GROUP BY Processing in TD 14

Hi,
I wanted to know how the GROUP BY is processed in TD14. I've read some posts and blogs about GROUP BY vs DISTINCT in pre-TD13 like:
http://forums.teradata.com/forum/database/spool- treatment-for-distinct-versus-group-by
But I couldn't find any discussion on how it is done in TD14. I wanted to know if there was a way to estimate or know the spool usage for the aggregate step like how Sir Dnoeth explained it.
Would appreciate any feedback or even just a link or document which could provide explanation.
 
Thanks,
Anne

Adeel Chaudhry 773 posts Joined 04/08
02 Feb 2014

GROUP BY for TD14 is explained on following link:
 
http://www.info.teradata.com/htmlpubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1146_111A/ch01.032.178.html

-- If you are stuck at something .... consider it an opportunity to think anew.

Adeel Chaudhry 773 posts Joined 04/08
02 Feb 2014

For a similar description .... wait for Dieter's reply. :)

-- If you are stuck at something .... consider it an opportunity to think anew.

AnneTemp 2 posts Joined 02/14
05 Feb 2014

Thank you so much for your reply (^_^)
I didn't find the spool info I needed but will still try searching for it using the link you gave (^_^)

dnoeth 4628 posts Joined 11/04
09 Feb 2014

Since TD13 there are multiple algorithms to do aggregation, some are indicated in Explain by "skippiong local/global cache", etc.
Plus the optimizer automatically decides if DISTINCT or GROUP BY is more efficient.
And as there's no way to force the optimizer to use one specific algorithm, you can only check estimated spool size in Explain and/or DBQL steps.

Dieter

sderecho 2 posts Joined 02/14
10 Feb 2014

Hi all-
 
One quick question:
 
What is the difference between select distinct and group by?
 
i.e. select distinct field from table  vs. select field from table group by 1??
 
Will these two queries pull the same result? Is the goup by more efficient?
 
Thanks,
Sam

M.Saeed Khurram 544 posts Joined 09/12
11 Feb 2014

The queries will pull the same result but there is processing difference between these two queries. Distinct will first sort and then eliminate duplicates, while group by do not sort. So group by is more efficient.
 

Khurram

sderecho 2 posts Joined 02/14
11 Feb 2014

Great- thank you for this clear answer, Khurram.
 
-Sam

dnoeth 4628 posts Joined 11/04
11 Feb 2014

Depending on the algorithm GROUP BY also sorts: ARSA - Aggregate-Redistribute-Sort-Aggregate.
And GROUP BY is not always more efficient, there are cases where DISTINCT is faster, if the rows per value is lower than number of AMPs.
That's why the optimizer decides since TD13 which is more effcient based on statistics.
I posted some comparison in the thread mentioned in the first post:
http://forums.teradata.com/forum/database/spool-treatment-for-distinct-versus-group-by

Dieter

You must sign in to leave a comment.