All Forums Database
sinnet 4 posts Joined 09/14
25 Sep 2014
Data redistribution

The PI for the table is tran_id.
Can someone please let me know why  the data is being redistributed for volatile table creation when the PI is the same for the volatile table as well.
Please see Explain plans below.
CREATE SET TABLE TRAN_STAT ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      Tran_Id DECIMAL(18,0) NOT NULL,
      Tran_Line_Nbr INTEGER,
      Tran_Line_Stat_Start_Dt DATE FORMAT 'YYYY/MM/DD',
      Tran_Line_Stat_End_Dt DATE FORMAT 'YYYY/MM/DD',
      Stat_Cd VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      Ins_Batch_Id INTEGER NOT NULL,
      Updt_Batch_Id INTEGER)
PRIMARY INDEX ( Tran_Id );
Explain
sel *
 from TRAN_STAT
 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1
 
  1) First, we lock a distinct "pseudo table" for read on a
     RowHash to prevent global deadlock for TRAN_STAT.
  2) Next, we lock TRAN_STAT for read.
  3) We do an all-AMPs STAT FUNCTION step from TRAN_STAT by
     way of an all-rows scan with no residual conditions into Spool 5
     (Last Use), which is built locally on the AMPs.  The result rows
     are put into Spool 1 (group_amps), which is built locally on the
     AMPs.  The size is estimated with low confidence to be 556,288,594
     rows (41,721,644,515 bytes).
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
 
 
EXPLAIN create volatile  table vt_ts
as
(sel *
 from TRAN_STAT
 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1
 ) with data primary index (tran_id)  on commit preserve rows        ;
 
  1) First, we lock a distinct "pseudo table" for read on a
     RowHash to prevent global deadlock for TRAN_STAT.
  2) Next, we lock TRAN_STAT for read.
  3) We create the table header.
  4) We do an all-AMPs STAT FUNCTION step from TRAN_STAT by
     way of an all-rows scan with no residual conditions into Spool 5
     (Last Use), which is built locally on the AMPs.  The result rows
     are put into Spool 3 (all_amps), which is built locally on the
     AMPs.  The size is estimated with high confidence to be
     889,500,023 rows (70,270,501,817 bytes).
  5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan with a condition of ("Field_9 = 1") into Spool 1
     (all_amps), which is redistributed by the hash code of (
     TRAN_STAT.Tran_Id) to all AMPs.  Then we do a SORT to
     order Spool 1 by row hash.  The result spool file will not be
     cached in memory.  The size of Spool 1 is estimated with low
     confidence to be 556,288,594 rows (32,821,027,018 bytes).  The
     estimated time for this step is 3 minutes and 7 seconds.
  6) We do an all-AMPs MERGE into vt_ts from Spool 1 (Last
     Use).  The size is estimated with low confidence to be 556,288,594
     rows.  The estimated time for this step is 1 second.
  7) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.
 

26 Sep 2014

Hi Sennet,
Check the difference in the type of operation your performing. Ur creating a SET table with just the schema for which u could see the EXPLAIN PLAIN.
Whereas when u create the volatile table the PI might be the same, but in the select ur doing a rank() over, the data is redistributed to all the AMPS for comparison to find the rank depending on the Tran_Id
 
Thanks
Jugal.

sinnet 4 posts Joined 09/14
26 Sep 2014

No, I am asking why there is data redistribution  in the second explain plan, the PI being tran_id why is it not amp local.
 
I am comparing these two explain plans.
 
Explain
sel *
 from TRAN_STAT
 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1;
 
EXPLAIN create volatile  table vt_ts
as
(sel *
 from TRAN_STAT
 qualify rank() over (partition by tran_id order by tran_line_stat_start_dt asc) = 1
 ) with data primary index (tran_id)  on commit preserve rows        ;

sinnet 4 posts Joined 09/14
29 Sep 2014

Is my question unclear or silly. Please let me know.

Raja_KT 1246 posts Joined 07/09
29 Sep 2014

Your question is clear. I ran for some of my tables 
Explain
sel *
I can see there is redistribution by hashcode.
It looks like it is upto the optimizer and the data demography.

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

ToddAWalter 316 posts Joined 10/11
30 Sep 2014

It appears that after the optimizer planned the qualify over the tran_Id it forgot that the spool was still on the amps of the hash of tran_id. It knew that for the rank/partition/qualify because it did both of those spools locally. 
Note that a step would be required there anyway to sort the spool by tran_id prior to the insert since the data has been ordered by the start date field in the spool. But a redistribution is not required for that sort step. 
It would be reasonable to submit an incident asking for an RFC to improve the optimizer for this case. 

sinnet 4 posts Joined 09/14
01 Oct 2014

Thanks. Will open an incident with Teradata.

You must sign in to leave a comment.