All Forums Database
teradata_chela2 18 posts Joined 10/13
06 Nov 2014
SELECT RANDOM RECORDS from big table

Hi Experts ,

I want to select 10000 random records from one big table with 8 Billion records and insert data with another table with same structure(PI).

I have tried TOP 10000 AND SAMPLE 10000 .

Both the cased explain plan shows optimizer redistributes + sorts (8 Billion records) on PI columns then get 10000 records . :(

Each time query fails with Out of spool space error .

Please help.

Arint

dnoeth 4628 posts Joined 11/04
06 Nov 2014

Hi Arint,
can you show your actual query and an Explain of your query?
Both SAMPLE and TOP (without ORDER BY) should not redistribute and/or sort.
 

Dieter

teradata_chela2 18 posts Joined 10/13
06 Nov 2014

Hi Dieter . Thanks a lot for your quick response . I have pasted the plans below .
 Please pardon me if i misunderstand the query plan . thanks again .
 Explain with TOP 10000(without order by )
 
  1) First, we lock a distinct SOURCE_DATABASE."pseudo table"
      for read on a RowHash to prevent global deadlock for
      SOURCE_DATABASE.SOURCE_TABLE.
   2) Next, we lock a distinct TARGET_DATABASE."pseudo table" for
      write on a RowHash to prevent global deadlock for
      TARGET_DATABASE.SOURCE_TABLE_06nov.
   3) We lock SOURCE_DATABASE.SOURCE_TABLE for read,
      and we lock TARGET_DATABASE.SOURCE_TABLE_06nov for
      write.
   4) We do an all-AMPs STAT FUNCTION step from
      SOURCE_DATABASE.SOURCE_TABLE by way of an
      all-rows scan with no residual conditions into Spool 5 (Last Use),
      which is assumed to be redistributed by value to all AMPs.  The
      result rows are put into Spool 3 (all_amps) (compressed columns
      allowed), which is built locally on the AMPs.  The size is
      estimated with high confidence to be 8,419,707,707 rows (
      5,169,700,532,098 bytes).
   5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
      an all-rows scan with a condition of ("Field_118 <= 1000") into
      Spool 1 (all_amps) (compressed columns allowed), which is
      redistributed by the hash code of (
      SOURCE_DATABASE.SOURCE_TABLE.COL1,
      SOURCE_DATABASE.SOURCE_TABLE.COL2,
      SOURCE_DATABASE.SOURCE_TABLE.COL3) to all
      AMPs.  Then we do a SORT to order Spool 1 by row hash.  The result
      spool file will not be cached in memory.  The size of Spool 1 is
      estimated with high confidence to be 8,419,707,707 rows (
      5,068,664,039,614 bytes).  The estimated time for this step is 4
      hours and 6 minutes.
   6) We do an all-AMPs MERGE into
      TARGET_DATABASE.SOURCE_TABLE_06nov from Spool 1 (Last
      Use).  The size is estimated with high confidence to be
      8,419,707,707 rows.  The estimated time for this step is 76 hours
      and 30 minutes.
   7) We spoil the parser's dictionary cache for the table.
   8) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> No rows are returned to the user as the result of statement 1.
 
Explain with sample 10000(without order by )

 
  1) First, we lock a distinct SOURCE_DATABASE."pseudo table"
      for read on a RowHash to prevent global deadlock for
      SOURCE_DATABASE.SOURCE_TABLE.
   2) Next, we lock a distinct TARGET_DATABASE."pseudo table" for
      write on a RowHash to prevent global deadlock for
      TARGET_DATABASE.SOURCE_TABLE_06nov.
   3) We lock SOURCE_DATABASE.SOURCE_TABLE for read,
      and we lock TARGET_DATABASE.SOURCE_TABLE_06nov for
      write.
   4) We do an all-AMPs SAMPLING step from
      SOURCE_DATABASE.SOURCE_TABLE by way of an
      all-rows scan with no residual conditions into Spool 1 (all_amps)
      (compressed columns allowed), which is redistributed by the hash
      code of (
      SOURCE_DATABASE.SOURCE_TABLE.COL1,
      SOURCE_DATABASE.SOURCE_TABLE.COL2,
      SOURCE_DATABASE.SOURCE_TABLE.COL3) to all
      AMPs.  Then we do a SORT to order Spool 1 by row hash.  Samples
      are specified as a number of rows.
   5) We do an all-AMPs MERGE into
      TARGET_DATABASE.SOURCE_TABLE_06nov from Spool 1 (Last
      Use).  The size is estimated with high confidence to be
      8,419,707,707 rows.  The estimated time for this step is 76 hours
      and 30 minutes.
   6) We spoil the parser's dictionary cache for the table.
   7) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> No rows are returned to the user as the result of statement 1.

Arint

dnoeth 4628 posts Joined 11/04
07 Nov 2014

Hi Arint,
the SAMPLE is not sorting and it redistributes the resulting rows only (the estimated number of rows is just misleading).
The TOP indicates you're running an older version of Teradata where this was poorly optimized.
 
Better go with SAMPLE :)

Dieter

teradata_chela2 18 posts Joined 10/13
07 Nov 2014

Thanks a lot Dieter . You are right as always . I am working on old TD 12.00 .  :P . Will go with your suggestion .

Arint

You must sign in to leave a comment.