All Forums Database
LUCAS 56 posts Joined 06/09
17 Jul 2009
SELECT TOP N - odd behaviour (v2r6)

Hello,

"SELECT TOP n from table" can be very fast, even on large tables, and is showing an odd behaviour on other tables,
with a three steps explain .. to get 10 records in one minute !

How to understand the first all-rows scan in step below ?

3) We do an all-AMPs RETRIEVE step from "tablename" by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with high
confidence to be 204,849,278 rows. The estimated time for this
step is 1 minute and 24 seconds.
4) We do an all-AMPs STAT FUNCTION step from Spool 2 by way of an
all-rows scan into Spool 5, which is built locally on the AMPs.
The result rows are put into Spool 1 (group_amps), which is built
locally on the AMPs. This step is used to retrieve the TOP 10
rows. Single AMP optimization is used. If this step retrieves
less than 10 rows, then execute step 5.
5) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 5 (Last Use), which is built
locally on the AMPs. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. This step is
used to retrieve the TOP 10 rows.
6) Finally, we send out an END TRANSACTION step to all AMPs involved ...

Random_Thought 87 posts Joined 06/09
17 Jul 2009

I am no expert here, I am just learning the "Teradata way", but I think this is repartitioning your data, doing an all AMP scan of all rows and and building a spool file on each of the amps, then doing a local all rows scan to locate the top ten on each amp then passing back to the PE?

rluebke 65 posts Joined 11/05
17 Jul 2009

It looks like you need to collect stats on the table.

Random_Thought 87 posts Joined 06/09
19 Jul 2009

@rluebke - Could you expand on this a little? what clue in the Explain do you see that leads you to think it needs more up to date Stats? I saw that it read 209 million rows, so it must have stats of some description?

@lucas could you post the query and any info on table? if it does not contravene confidentiality etc?

LUCAS 56 posts Joined 06/09
20 Jul 2009

Well,
i can report hereafter the comparison between "table_1" and "table_2" :
select TOP 10 * on the biggest one ("table_2) returns 10 rows in 1 second !
Statistics are collected and up to date for those two tables.

Table_1 (MULTISET, PI with many duplicates on PI) 205 893 673 rows
Table_2 (MULTISET, PI & PPI & NUSI) 2 988 182 783 rows

Table_1 EXPLAIN
1) First, we lock a distinct DWH_SOCLE."pseudo table" for read on a
RowHash to prevent global deadlock for "table_1".
2) Next, we lock "table_1" for read.
3) We do an all-AMPs RETRIEVE step from "table_1" by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps)
(compressed columns allowed), which is built locally on the AMPs.
The input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with high
confidence to be 205,843,087 rows. The estimated time for this
step is 1 minute and 25 seconds.
4) We do an all-AMPs STAT FUNCTION step from Spool 2 by way of an
all-rows scan into Spool 5, which is built locally on the AMPs.
The result rows are put into Spool 1 (group_amps), which is built
locally on the AMPs. This step is used to retrieve the TOP 10
rows. Single AMP optimization is used. If this step retrieves
less than 10 rows, then execute step 5.
5) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 5 (Last Use), which is built
locally on the AMPs. The result rows are put into Spool 1
(group_amps), which is built locally on the AMPs. This step is
used to retrieve the TOP 10 rows.
6) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

Table_2 EXPLAIN

1) First, we lock a distinct DWH_AG."pseudo table" for read on a
RowHash to prevent global deadlock for "table_2".
2) Next, we lock "table_2" for read.
3) We do an all-AMPs STAT FUNCTION step from "table_2"
by way of an all-rows scan with no residual conditions into Spool 5,
which is built locally on the AMPs. The result rows are put
into Spool 1 (group_amps), which is built locally on the AMPs.
This step is used to retrieve the TOP 10 rows. Single AMP
optimization is used. If this step retrieves less than 10 rows,
then execute step 4.
4) We do an all-AMPs STAT FUNCTION step from "table_2"
by way of an all-rows scan with no residual conditions into Spool 5
(Last Use), which is built locally on the AMPs. The result rows
are put into Spool 1 (group_amps), which is built locally on the
AMPs. This step is used to retrieve the TOP 10 rows.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

Thanks for any suggestion

Pierre

dnoeth 4628 posts Joined 11/04
20 Jul 2009

Hi Pierre,
you should open an incident with your Teradata support.
If it's a plain "select top 10 * from table;", without ORDER/JOIN/WHERE that RETRIEVE-step should not happen.

Dieter

Dieter

LUCAS 56 posts Joined 06/09
21 Jul 2009

Hi Dieter,
yes it is a plain "SELECT TOP 10 * FROM table;"
i didn't mentionned that PI of "Table_1" is a 5 columns index (skewfact=0), when PI for "Table_2" is a 2 columns index. But this not a clue ...

Pierre

ahmadghazal 1 post Joined 07/09
29 Jul 2009

Seems like a bug if it is a simple select TOP from table.

maniamenon 10 posts Joined 05/09
30 Jan 2012

Hi Dieter,

I do face the same problem when run a select on a Big Table with "TOP" clause.

The explain says "2) Next, we do an all-AMPs STAT FUNCTION step from ...."

I have observed that when I do a select like

"SELECT TOP 100 FROM table1;

It is taking a long time too to process and delays the entire process.

when I removed the TOP 100, it started working faster.

Is the TOP 100 is the clause invokes the STAT FUNCTION ?

On huge tables, why it delays?

 

Thanks,
Mani

You must sign in to leave a comment.