All Forums Database
Win 28 posts Joined 03/13
11 Jan 2016
Spool Space Issue on More Projected Columns Than Lesser Projected Columns

What could be the cause of this?
I have the following two queries with similar source views and joins:
FIRST QUERY
The first one has more columns projected but is not returning a spool space issue: (EXPLAIN plan provided below - 5 steps)
SEL      D.PrimaryIndex
    , NonKeyColumn1, NonKeyColumn2, NonKeyColumn3, NonKeyColumn4, NonKeyColumn5
    , NonKeyColumn6, NonKeyColumn7, NonKeyColumn8, NonKeyColumn9, NonKeyColumn10
    , NonKeyColumn11, NonKeyColumn12, NonKeyColumn13, NonKeyColumn14, NonKeyColumn15
    , NonKeyColumn16, NonKeyColumn17, NonKeyColumn18, NonKeyColumn19, NonKeyColumn20
    , NonKeyColumn21, NonKeyColumn22, NonKeyColumn23, NonKeyColumn24, NonKeyColumn25
    , NonKeyColumn26, NonKeyColumn27, NonKeyColumn28, NonKeyColumn29, NonKeyColumn30
    , NonKeyColumn31, NonKeyColumn32, NonKeyColumn33, NonKeyColumn34, NonKeyColumn35
    , NonKeyColumn36, NonKeyColumn37, NonKeyColumn38, NonKeyColumn39, NonKeyColumn40
    , NonKeyColumn41, NonKeyColumn42, NonKeyColumn43, NonKeyColumn44, NonKeyColumn45
    , NonKeyColumn46, NonKeyColumn47, NonKeyColumn48, NonKeyColumn49, NonKeyColumn50
    , NonKeyColumn51
    , CASE WHEN D.NonKeyColumn52 NOT IN ('A', 'B')
                AND F.NonKeyColumn52 NOT LIKE '%A%'
                AND F.NonKeyColumn52 NOT LIKE '%B%'
            THEN 'A'
        ELSE D.NonKeyColumn52
      END NonKeyColumn52
    , NonKeyColumn53
    , CASE WHEN COALESCE(NonKeyColumn54,'A') = 'A'
            THEN 'A'
        ELSE 'B'
      END NonKeyColumn54
FROM VIEW1 D
INNER JOIN VIEW2 X
    ON D.PrimaryIndex = X.PrimaryIndex
LEFT JOIN VIEW3 F
    ON X.NotIndex = F.PrimaryIndex
;

 
SECOND QUERY
The second query has lesser columns projected but is returning a spool space issue: (EXPLAIN plan provided below - 4 steps)
SEL D.PrimaryIndex
    , NonKeyColumn1, NonKeyColumn2, NonKeyColumn3, NonKeyColumn4, NonKeyColumn5
    , NonKeyColumn6, NonKeyColumn7, NonKeyColumn8, NonKeyColumn9, NonKeyColumn10
    , CASE WHEN D.NonKeyColumn11 NOT IN ('A', 'B')
                AND F.NonKeyColumn11 NOT LIKE '%A%'
                AND F.NonKeyColumn11 NOT LIKE '%B%'
            THEN 'A'
        ELSE D.NonKeyColumn11
      END NonKeyColumn12
    , CASE WHEN COALESCE(NonKeyColumn54,'A') = 'A'
            THEN 'A'
        ELSE 'B'
      END NonKeyColumn54
FROM VIEW1 D
INNER JOIN VIEW2 X
    ON D.PrimaryIndex = X.PrimaryIndex
LEFT JOIN VIEW3 F
    ON X.NotIndex = F.PrimaryIndex
;

 
EXPLAIN PLAN FOR QUERY1
  1) First, we lock TABLE1 in view
     VIEW1 for access, we lock
     TABLE2 in view VIEW2 for
     access, and we lock TABLE3 in view
     VIEW3 for access.
  2) Next, we execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          TABLE3 in view VIEW 3 by way of an
          all-rows scan with a condition of ("NOT
          (TABLE3 in view VIEW3.PrimaryIndex IS NULL)")
          into Spool 2 (all_amps) (compressed columns allowed) fanned
          out into 2 hash join partitions, which is built locally on
          the AMPs.  The size of Spool 2 is estimated with high
          confidence to be 2,001,784 rows (390,347,880 bytes).  The
          estimated time for this step is 4.14 seconds.
       2) We do an all-AMPs RETRIEVE step from TABLE2
          in view VIEW2 by way of an all-rows scan with a condition of
          ("NOT (TABLE2 in view VIEW2.PrimaryIndex IS NULL)") into
          Spool 3 (all_amps) (compressed columns allowed) fanned out
          into 2 hash join partitions, which is redistributed by the
          hash code of (TABLE2.ColumnWithCompress) to all AMPs.
          The size of Spool 3 is estimated with high confidence to be
          102,578,207 rows (4,000,550,073 bytes).  The estimated time
          for this step is 20.75 seconds.
  3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are right outer joined using a
     hash join of 2 partitions, with condition(s) used for non-matching
     on right table ("NOT (NotIndex IS NULL)"), with a join condition of
     ("NotIndex = PrimaryIndex").  The result goes into Spool 4 (all_amps)
     (compressed columns allowed), which is redistributed by the hash
     code of (TABLE2.PrimaryIndex) to all AMPs.  Then we do
     a SORT to order Spool 4 by row hash.  The result spool file will
     not be cached in memory.  The size of Spool 4 is estimated with
     index join confidence to be 237,347,366 rows (48,656,210,030
     bytes).  The estimated time for this step is 3 minutes and 58
     seconds.
  4) We do an all-AMPs JOIN step from TABLE1 in view
     VIEW1 by way of a RowHash match scan, which
     is joined to Spool 4 (Last Use) by way of a RowHash match scan.
     TABLE1 and Spool 4 are joined using a merge join,
     with a join condition of ("TABLE1.PrimaryIndex = PrimaryIndex").
     The result goes into Spool 1 (group_amps), which is built locally
     on the AMPs.  The result spool file will not be cached in memory.
     The size of Spool 1 is estimated with index join confidence to be
     316,463,155 rows (358,552,754,615 bytes).  The estimated time for
     this step is 6 minutes and 10 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 10 minutes and 28
     seconds.

 
EXPLAIN PLAN FOR QUERY2
  1) First, we lock TABLE1 in view VIEW1 for access, we lock
     TABLE2 in view VIEW2 for access, and we lock TABLE3 in view
     VIEW3 for access.
  2) Next, we execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from
          TABLE3 in view VIEW3 by way of an
          all-rows scan with a condition of ("NOT
          (TABLE3 in view VIEW3.PrimaryIndex IS NULL)")
          into Spool 2 (all_amps) (compressed columns allowed) fanned
          out into 2 hash join partitions, which is built locally on
          the AMPs.  The size of Spool 2 is estimated with high
          confidence to be 2,001,784 rows (390,347,880 bytes).  The
          estimated time for this step is 4.14 seconds.
       2) We do an all-AMPs JOIN step from TABLE2 in
          view VIEW2 by way of a RowHash match
          scan, which is joined to TABLE1 in view
          VIEW1 by way of a RowHash match scan.
          TABLE2 and TABLE1 are joined using a merge join,
          with a join condition of (
          "TABLE1.PrimaryIndex = TABLE2.PrimaryIndex").
          The result goes into Spool 3 (all_amps)
          (compressed columns allowed) fanned out
          into 2 hash join partitions, which is redistributed by the
          hash code of (TABLE2.ColumnWithCompress) to all AMPs.
          The size of Spool 3 is estimated with low confidence to be
          136,770,943 rows (17,780,222,590 bytes).  The estimated time
          for this step is 59.29 seconds.
  3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to Spool 3 (Last Use) by way of an
     all-rows scan.  Spool 2 and Spool 3 are right outer joined using a
     hash join of 2 partitions, with condition(s) used for non-matching
     on right table ("NOT (NotIndex IS NULL)"), with a join condition of
     ("NotIndex = PrimaryIndex").  The result goes into Spool 1
     (group_amps), which is built locally on the AMPs.  The result
     spool file will not be cached in memory.  The size of Spool 1 is
     estimated with index join confidence to be 317,130,417 rows (
     47,569,562,550 bytes).  The estimated time for this step is 1
     minute and 26 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 2 minutes and 26 seconds.

Thank you in advance for any reply.

Glass 225 posts Joined 04/10
11 Jan 2016

Are your statistics current and complete?
 
Rglass

Win 28 posts Joined 03/13
14 Jan 2016

Hi Rglass,
I think that the statistics are not complete because of the "low confidence" shown in the EXPLAIN of the 2nd query; but I am not the one who is collecting the statistics nor do I have privilege to COLLECT STATS.
Thank you.
 
Best regards,
Win

Win 28 posts Joined 03/13
25 Jan 2016

I just noticed the the title on my subject is wrong. I cannot edit it; but it should be: "Spool Space Issue on LESSER Projected Columns than on More Projected Columns".

ToddAWalter 316 posts Joined 10/11
25 Jan 2016

A close look at the explain tells the story. In query 1, the content of Table1 participating in the query is much bigger since there are so many more columns included from it. The optimizer leaves Table1 in place and performs the other join first, then joins to the in place Table1. In query 2, since so much of Table1 is not participating, the optimizer can afford to do a direct merge join to Table1 and spool the result because so many columns will be projected out in the spool.
 
It seems like one of two things is happening however. Since the optimizer thinks that the spool for query2 is significantly smaller, either the statistics are not complete/current so the optimizer is not estimating correctly, or there is a lot of skew in query2 - probably on Table2.ColumnWithCompress on which spool 3 is redistributed. 
 
Statistics on Table2.columnwithcompress may help the optimizer avoid the skew. Current stats on the PI columns being joined may help with estimates of the join size. It would be good to start by investigating if the estimates for the number of rows resulting from the join are correct and if there is a lot of skew in the above column in the join result of Tab;e2 and Table1.

Win 28 posts Joined 03/13
05 Feb 2016

Hi Todd,
Thank you very much for your input.
 
Best regards,
Win

You must sign in to leave a comment.