All Forums General
Vinay151 20 posts Joined 12/14
14 Sep 2015
Tuning the SQL query

Hi,
   I have a multiset volatile table having 10k records and a table having 5 billion records. I have to create a multiset volatile table by inner joining these two tables. But, the query is failing due to spool space issue. Is there any way to tune this SQL, apart from increasing the spool space?
 
Thanks,
Vinay.

jlasheras 6 posts Joined 08/12
16 Sep 2015

Hi Vinay,
My first question would be if the inner join condition applies to the Primary Index (PI) of the larger table.
Otherwise the execution plan would probably try to redistribute those 5 billion records to Spool, preparing for join processing, and that would probably exhaust your user Spool quota.
So you can tune the SQL if the join condition is not currently applied to the PI of the larger table, check if this is feasible, depending on the volatile table design.
Skew or bad PI distribution could also be the problem, i.e. multiple records having same PI value in the result set would hash to same AMP and perhaps trigger the Spool space error (remeber the Spool quota is divided by the number of AMPs, and if just one AMP hits its own maximum the Spool space error occurs).
But this is a data demographics issue.
Normally to diagnose a problem like this and suggest SQL tuning one needs an EXPLAIN of the query and the DDL of the accessed tables.
Thank you
 

ToddAWalter 316 posts Joined 10/11
05 Oct 2015

To better understand the question we would need to know the SQL conditions relating the tables. Are these conditions selective or is it expected that the entire 5 billion rows would be included in the result? If the latter, then spool space for two copies of the 5 billion rows will be necessary - one for the join result and one for the final result. And of course those both would have all the columns of the result of the join.

Vinay151 20 posts Joined 12/14
08 Oct 2015

Hi,
 
This is the explain plan I got:
 
1) First, we lock a distinct QA_RMAP_1."pseudo table" for read on a
     RowHash to prevent global deadlock for
     QA_RMAP_1.ENROLL_SNAPSHOT_DATE.
  2) Next, we lock a distinct QA_RMAP_1."pseudo table" for read on a
     RowHash to prevent global deadlock for
     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.
  3) We lock QA_RMAP_1.ENROLL_SNAPSHOT_DATE for read, and we lock
     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP for read.
  4) We create the table header.
  5) We do an all-AMPs RETRIEVE step from
     QA_RMAP_1.ENROLL_SNAPSHOT_DATE by way of an all-rows scan with a
     condition of ("(QA_RMAP_1.ENROLL_SNAPSHOT_DATE.SNPSHT_ID = 2) AND
     (QA_RMAP_1.ENROLL_SNAPSHOT_DATE.SNPSHT_TYP = 1)") into Spool 2
     (all_amps), which is built locally on the AMPs.  The size of Spool
     2 is estimated with high confidence to be 156,627,426 rows (
     5,795,214,762 bytes).  The estimated time for this step is 13.76
     seconds.
  6) We do an all-AMPs JOIN step from
     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP by way of a RowHash
     match scan with no residual conditions, which is joined to Spool 2
     (Last Use) by way of a RowHash match scan.
     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP and Spool 2 are
     joined using a merge join, with a join condition of (
     "(QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.PROMO_SBSC_START_DT
     <= SNPSHT_DT) AND
     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.PROMO_SBSC_END_DT
     > SNPSHT_DT) AND
     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.BILL_START_DT <=
     SNPSHT_DT) AND
     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.BILL_END_DT >
     SNPSHT_DT) AND
     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.CNCT_DT <=
     SNPSHT_DT) AND
     ((QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.DISCNCT_DT >
     SNPSHT_DT) AND
     (QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP.CUST_ACCT_KEY =
     CUST_ACCT_KEY ))))))").  The input table
     QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP will not be cached
     in memory, but it is eligible for synchronized scanning.  The
     result goes into Spool 1 (all_amps), which is redistributed by
     hash code to all AMPs.  Then we do a SORT to order Spool 1 by row
     hash.  The result spool file will not be cached in memory.  The
     size of Spool 1 is estimated with low confidence to be
     17,497,152,691 rows (23,848,619,117,833 bytes).  The estimated
     time for this step is 10 hours and 17 minutes.
  7) We do an all-AMPs MERGE into
     QA_RMAP_SQLETL.TMP_SUBSCRIPTION_SNAPSHOT_JOIN_INNER_1 from Spool 1
     (Last Use).  The size is estimated with low confidence to be
     17,497,152,691 rows.  The estimated time for this step is 1,336
     hours and 9 minutes.
  8) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

prathap.s 4 posts Joined 06/14
03 Nov 2015

Hi Vinay,
1. Check the duplicates for the table 'REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP' on PK columns.
2. Check for the Multi Column Stats on below combination. Always Stats Play a major role.
Run the below query.

COLLECT STATS COLUMN (PROMO_SBSC_START_DT,PROMO_SBSC_END_DT,BILL_START_DT,BILL_END_DT,CNCT_DT,DISCNCT_DT,CUST_ACCT_KEY)

ON QA_RMAP_1.REVENUE_SUBSCRIPTION_PRODUCT_NO_DUP ;

You must sign in to leave a comment.