All Forums Database
TrackingDaily 9 posts Joined 09/12
08 Dec 2015
very fast query + very fast query = very slow query ???

I am trying to replicate the following query, which takes under half a second, in a more dynamic way to expand it's usage:

/* hard-coded Inventory Codes */
/* execution time: 0.5 second */
select
   r.*
   
from
   our_dw.sales_detail r
   
where
   r.InventoryCode IN ('15KIE01', '15KID00', '15KID01', '15KIB01', '15KIA00', '15KIB02', '15KIC00', '15KIA01', '15KIB00', '15KIC01', '15KIE00')

Fortunately, the query that returns those InventoryCodes itself takes just 0.1 second to run: 

/* query to get inventory codes from description */
/* execution time: 0.1 second */
select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%'

Therefore, I figured the combination of a 0.5 second query and an 0.1 second query would still be a very fast query. Unfortunately, the following takes 42 seconds:

/* why can't any of the following queries even come close to combining the two above??? */
/* execution time: 41.9 seconds */
select
   r.*
   
from
   our_dw.sales_detail r
   
where
   r.InventoryCode in (select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%')
   

and every other combination (JOINs, CTE's) takes even longer:

/* execution time: 149.5 seconds */
select
   r.*
   
from
   our_dw.sales_detail r inner join
      our_dw.inventory_summary s on r.InventoryCode = s.InventoryCode
   
where
   s.ItemDescription like 'Widget B%'
   
   ;
   
/* execution time: 160.9 seconds */
select
   r.*
   
from
   our_dw.sales_detail r inner join
      (select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%') s on r.InventoryCode = s.InventoryCode
   
   
   ;   

/* execution time: 195.9 seconds */
with the_codes as (select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%')

select
   r.*
   
from
   our_dw.sales_detail r inner join 
      the_codes c on r.InventoryCode = c.InventoryCode
   
   ;
   
/* execution time: 71.7 seconds */
with the_codes as (select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%')

select
   r.*
   
from
   our_dw.sales_detail r 
   
where
   r.InventoryCode in (select InventoryCode from the_codes)

   ;
   

Finally, with a temp table, I was able to get the performance I expected, but only if we avoid a JOIN:


create volatile table temp_codes as (
  select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%'
) WITH DATA ON COMMIT PRESERVE ROWS;

/* execution time: 151.8 seconds */
select
   r.*
   
from
   our_dw.sales_detail r inner join 
      temp_codes c on r.InventoryCode = c.InventoryCode
      
   ;

/* execution time: 0.5 seconds AHA! */
/* why isn't Teradata smart enough to run ALL the above queries like this ?? */
select
   r.*
   
from
   our_dw.sales_detail r 
   
where
   r.InventoryCode in (select InventoryCode from temp_codes)

Every one of these queries returns the identical dataset. Why isn't Teradata smart enough to first get the list of InventoryCodes, then execute the rest of the query as if it had the hard-coded list to begin with (i.e. just like it does in the original sample) ?  For the database to take so long must mean it is doing table scans and lookups that aren't necessary.  Can I provide "hints" to help Teradata understand what is going on and to speed things up?
We are on v.14.10.05.04 and using JDBC driver v.15.10.00.09 (I get similar results via SQL Assistant also)
Thank you.

dnoeth 4628 posts Joined 11/04
08 Dec 2015

Can you add some more details, (P)PI & statistics of both tables plus some Explains?

Dieter

TrackingDaily 9 posts Joined 09/12
08 Dec 2015

I am not the DBAdmin, just the primary user, I don't have access to detailed stats. But "inventory_summary" has 101,000 rows, "sales_detail" has 30.5 million. Both of the above are views, not direct tables. InventoryCode is a varchar(15).
I can see in the Explains that in the 1st query (with hard-coded InventoryCodes), Teradata goes straight to "sales_detail" and very quickly pulls data back one-by-one based on each InventoryCode (therefore, 19 steps). In the 2nd query above (just obtaining the InventoryCodes, returns in 0.1 seconds), Teradata locks and reads from inventory_summary -- and figures out that it only needs to look in a single partition, and a single InventoryTypeID, and again is very fast. But the 3rd query above (42 seconds), Teradata doesn't appear to look at that view (inventory_summary is never locked for read) -- in fact it seems to work backwards, as Step 4 Part 1 has it retrieving from "sales_detail" before it even knows what InventoryCode's we will eventually be looking for.  Is there some way to tell Teradata to slow down, to do the first 2 steps one at a time and things will turn out much quicker?
1st query above:

 1) First, we lock a distinct edw_tables."pseudo table" for read on a
     RowHash to prevent global deadlock for edw_tables.r.
  2) Next, we lock a distinct edw_tables."pseudo table" for read on a
     RowHash to prevent global deadlock for edw_tables.m.
  3) We lock edw_tables.r in view our_dw.sales_detail for read, and we
     lock edw_tables.m in view our_dw.sales_detail for read.
  4) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KID00'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 196 rows.  The estimated time for this step
     is 0.01 seconds.
  5) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIE01'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 392 rows.  The estimated time for this step
     is 0.01 seconds.
  6) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KID01'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 588 rows.  The estimated time for this step
     is 0.01 seconds.
  7) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIB01'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 784 rows.  The estimated time for this step
     is 0.01 seconds.
  8) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIA00'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 980 rows.  The estimated time for this step
     is 0.01 seconds.
  9) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIB02'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 1,176 rows.  The estimated time for this
     step is 0.01 seconds.
 10) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIC00'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 1,372 rows.  The estimated time for this
     step is 0.01 seconds.
 11) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIA01'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 1,568 rows.  The estimated time for this
     step is 0.01 seconds.
 12) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIB00'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 1,764 rows.  The estimated time for this
     step is 0.01 seconds.
 13) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIC01'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  The size of Spool 4 is estimated with
     high confidence to be 1,960 rows.  The estimated time for this
     step is 0.01 seconds.
 14) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of index # 4 "edw_tables.r in view
     our_dw.sales_detail.InventoryCode = '15KIE00'" extracting row ids
     only with no residual conditions into Spool 4 (all_amps), which is
     built locally on the AMPs.  Then we do a SORT to order Spool 4 by
     row id eliminating duplicate rows.  The size of Spool 4 is
     estimated with high confidence to be 2,156 rows.  The estimated
     time for this step is 0.01 seconds.
 15) We do an all-AMPs RETRIEVE step from edw_tables.r in view
     our_dw.sales_detail by way of row ids from Spool 4 (Last Use)
     with a residual condition of ("((edw_tables.r in view
     our_dw.sales_detail.InventoryTypeID = 14) OR (edw_tables.r in view
     our_dw.sales_detail.InventoryTypeID = 1 )) AND (((edw_tables.r in
     view our_dw.sales_detail.InventoryCode )= '15KID00') OR
     (((edw_tables.r in view our_dw.sales_detail.InventoryCode )=
     '15KIE01') OR (((edw_tables.r in view
     our_dw.sales_detail.InventoryCode )= '15KID01') OR (((edw_tables.r in
     view our_dw.sales_detail.InventoryCode )= '15KIB01') OR
     (((edw_tables.r in view our_dw.sales_detail.InventoryCode )=
     '15KIA00') OR (((edw_tables.r in view
     our_dw.sales_detail.InventoryCode )= '15KIB02') OR (((edw_tables.r in
     view our_dw.sales_detail.InventoryCode )= '15KIC00') OR
     (((edw_tables.r in view our_dw.sales_detail.InventoryCode )=
     '15KIA01') OR (((edw_tables.r in view
     our_dw.sales_detail.InventoryCode )= '15KIB00') OR (((edw_tables.r in
     view our_dw.sales_detail.InventoryCode )= '15KIC01') OR
     ((edw_tables.r in view our_dw.sales_detail.InventoryCode )=
     '15KIE00')))))))))))") into Spool 5 (all_amps), which is
     redistributed by the rowkey of (edw_tables.r.InventoryCode) to all AMPs.
     Then we do a SORT to partition Spool 5 by rowkey.  The size of
     Spool 5 is estimated with low confidence to be 1,895 rows (
     109,910 bytes).  The estimated time for this step is 1.16 seconds.
 16) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of a
     RowHash match scan, which is joined to a single partition of
     edw_tables.m in view our_dw.sales_detail by way of a RowHash
     match scan with a condition of ("(edw_tables.m in view
     our_dw.sales_detail.InventoryTypeID = 14) OR (edw_tables.m in view
     our_dw.sales_detail.InventoryTypeID = 1)").  Spool 5 and edw_tables.m
     are left outer joined using a rowkey-based merge join, with a join
     condition of ("(InventoryTypeID = edw_tables.m.InventoryTypeID) AND
     (InventoryCode = edw_tables.m.InventoryCode)").  The result goes into Spool
     3 (all_amps), which is built locally on the AMPs.  The size of
     Spool 3 is estimated with low confidence to be 1,895 rows (
     147,810 bytes).  The estimated time for this step is 0.01 seconds.
 17) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     edw_tables.r.CustomerID ,edw_tables.r.AccountNumber
     ,edw_tables.r.InventoryTypeID ,edw_tables.r.StoreID
     ,edw_tables.r.SalesDate ,edw_tables.r.InventoryCode
     ,edw_tables.m.InventoryDescription).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 8.  The size of Spool 8 is
     estimated with low confidence to be 1,895 rows (348,680 bytes).
     The estimated time for this step is 0.03 seconds.
 18) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with low confidence
     to be 1,895 rows (223,610 bytes).  The estimated time for this
     step is 0.01 seconds.
 19) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
     No rows are returned to the user as the result of statement 2.
     The total estimated time is 1.29 seconds.

2nd query above:

  1) First, we lock a distinct edw_tables."pseudo table" for read on a
     RowHash to prevent global deadlock for edw_tables.sales_master.
  2) Next, we lock edw_tables.sales_master in view our_dw.inventory_summary
     for read.
  3) We do an all-AMPs RETRIEVE step from a single partition of
     edw_tables.sales_master in view our_dw.inventory_summary with a
     condition of ("edw_tables.sales_master in view
     our_dw.inventory_summary.InventoryTypeID = 1") with a residual condition
     of ("(edw_tables.sales_master in view
     our_dw.inventory_summary.InventoryDescription LIKE 'Widget B%') AND
     (edw_tables.sales_master in view our_dw.inventory_summary.InventoryTypeID =
     1)") into Spool 1 (group_amps), which is built locally on the AMPs.
     The size of Spool 1 is estimated with no confidence to be 91,068
     rows (2,549,904 bytes).  The estimated time for this step is 0.04
     seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
     No rows are returned to the user as the result of statement 2.
     The total estimated time is 0.04 seconds.

3rd query above:

  1) First, we lock a distinct edw_tables."pseudo table" for read on a
     RowHash to prevent global deadlock for edw_tables.r.
  2) Next, we lock a distinct edw_tables."pseudo table" for read on a
     RowHash to prevent global deadlock for edw_tables.sales_master.
  3) We lock edw_tables.r in view our_dw.sales_detail for read, and we
     lock edw_tables.sales_master for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from 2 partitions of
          edw_tables.r in view our_dw.sales_detail with a condition of
          ("(edw_tables.r in view our_dw.sales_detail.InventoryTypeID = 14)
          OR (edw_tables.r in view our_dw.sales_detail.InventoryTypeID = 1)")
          into Spool 4 (all_amps) fanned out into 4 hash join
          partitions, which is built locally on the AMPs.  The input
          table will not be cached in memory, but it is eligible for
          synchronized scanning.  The size of Spool 4 is estimated with
          low confidence to be 48,374,730 rows (2,805,734,340 bytes).
          The estimated time for this step is 41.26 seconds.
       2) We do an all-AMPs RETRIEVE step from a single partition of
          edw_tables.sales_master with a condition of (
          "edw_tables.sales_master.InventoryTypeID = 1") with a residual
          condition of ("(edw_tables.sales_master.InventoryDescription LIKE
          'Widget B%') AND (edw_tables.sales_master.InventoryTypeID = 1)")
          into Spool 6 (all_amps), which is built locally on the AMPs.
          Then we do a SORT to order Spool 6 by the sort key in spool
          field1 (edw_tables.sales_master.InventoryCode) eliminating
          duplicate rows.  The size of Spool 6 is estimated with no
          confidence to be 91,068 rows (3,278,448 bytes).  The
          estimated time for this step is 0.21 seconds.
  5) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by
          way of an all-rows scan into Spool 5 (all_amps) fanned out
          into 4 hash join partitions, which is duplicated on all AMPs.
          The size of Spool 5 is estimated with no confidence to be
          1,092,816 rows (39,341,376 bytes).
       2) We do an all-AMPs RETRIEVE step from a single partition of
          edw_tables.m in view our_dw.sales_detail with a condition of
          ("(edw_tables.m in view our_dw.sales_detail.InventoryTypeID = 14)
          OR (edw_tables.m in view our_dw.sales_detail.InventoryTypeID = 1)")
          into Spool 7 (all_amps), which is redistributed by the hash
          code of (edw_tables.m.InventoryTypeID, edw_tables.m.InventoryCode) to all
          AMPs.  The size of Spool 7 is estimated with low confidence
          to be 101,186 rows (4,249,812 bytes).  The estimated time for
          this step is 0.11 seconds.
  6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
     all-rows scan, which is joined to Spool 5 (Last Use) by way of an
     all-rows scan.  Spool 4 and Spool 5 are joined using a inclusion
     hash join of 4 partitions, with a join condition of ("InventoryCode =
     InventoryCode").  The result goes into Spool 8 (all_amps), which is
     redistributed by the hash code of (edw_tables.r.InventoryTypeID,
     edw_tables.r.InventoryCode) to all AMPs.  The size of Spool 8 is
     estimated with no confidence to be 46,103,186 rows (2,673,984,788
     bytes).  The estimated time for this step is 48.87 seconds.
  7) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an
     all-rows scan, which is joined to Spool 8 (Last Use) by way of an
     all-rows scan.  Spool 7 and Spool 8 are right outer joined using a
     single partition hash join, with condition(s) used for
     non-matching on right table ("NOT (InventoryCode IS NULL)"), with a
     join condition of ("(InventoryTypeID = InventoryTypeID) AND (InventoryCode =
     InventoryCode)").  The result goes into Spool 3 (all_amps), which is
     built locally on the AMPs.  The size of Spool 3 is estimated with
     no confidence to be 46,103,187 rows (3,596,048,586 bytes).  The
     estimated time for this step is 31.12 seconds.
  8) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     edw_tables.r.CustomerID ,edw_tables.r.AccountNumber
     ,edw_tables.r.InventoryTypeID ,edw_tables.r.StoreID
     ,edw_tables.r.SalesDate ,edw_tables.r.InventoryCode
     ,edw_tables.m.InventoryDescription).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 11.  The aggregate spool
     file will not be cached in memory.  The size of Spool 11 is
     estimated with no confidence to be 34,577,391 rows (6,362,239,944
     bytes).  The estimated time for this step is 4 minutes and 39
     seconds.
  9) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 34,577,391 rows (4,080,132,138 bytes).  The estimated time
     for this step is 25.61 seconds.
 10) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
     No rows are returned to the user as the result of statement 2.

 

ToddAWalter 316 posts Joined 10/11
08 Dec 2015

If you look closely at the step that implements edw_tables.sales_master.InventoryDescription LIKE 'Widget B%', you will see an estimate of "with no confidence to be 91,068 rows". When it has that large estimate, the optimizer builds a plan that is very "safe" in that it will work for any size of a return set without causing really bad performance if it is underestimated. Unfortunately, the safe plan misses the fast plan when the estimate is way over-estimated.
 
Are there statistics on InventoryDescription? That would give the optimizer some chance of estimating the number of rows selected by the LIKE condition. However, even that is not a guarantee since LIKE is very hard to estimate properly.
 
When you place the results in the temp table, we get to count the resulting rows before making the rest of the plan so we can use the fast plan when we know only a few rows have been selected.
 
In addition to collecting stats on InventoryDescription, can you provide the explain plan when you use the temp table?

TrackingDaily 9 posts Joined 09/12
09 Dec 2015

Thanks Todd, that is interesting. Note that the step you mention (step 4 part 2) is actually the same as query 2, step 3, and that takes a more efficient path (0.04 seconds vs 0.21 seconds). I can live with a few extra milliseconds but it is interesting that one plan is 5x faster.
What seems to me is that Teradata is getting fancy with it's concept of "parallelism" and using the opportunity to retrieve from the final, huge table (sales_detail) "in parallel" with the "LIKE 'Widget B%'" subquery (i.e. Step 4 parts 1 and 2).  This may work conceptually but is horrible in reality -- a 0.21 second query and a 41 second query are not, practically speaking, processing "in parallel" -- when one part is finished the other still has 99+% of the way to go.
Finally, all the steps after #5 just seem like an attempt to fix an execution path gone awry... steps 6 and 7 require JOINs, step 8 does an aggregate sum, all with longer and longer execution time estimates (in reality, they are overstated, the entire query takes 40-60 seconds, but still, way too long).
Below is the execution plan after creating a temp table (and using the IN clause). You're correct that in this case, Teradata recognizes that the row counts will be much smaller.

1) First, we lock a distinct big_edw."pseudo table" for read on a
     RowHash to prevent global deadlock for big_edw.r.
  2) Next, we lock a distinct big_edw."pseudo table" for read on a
     RowHash to prevent global deadlock for big_edw.m.
  3) We lock big_edw.r in view our_dw.sales_detail for read, and we
     lock big_edw.m in view our_dw.sales_detail for read.
  4) We do an all-AMPs RETRIEVE step from JSMITH.temp_codes by way
     of an all-rows scan with no residual conditions into Spool 5
     (all_amps), which is built locally on the AMPs.  Then we do a SORT
     to order Spool 5 by the sort key in spool field1 (
     JSMITH.temp_codes.InventoryCode) eliminating duplicate rows.  The
     size of Spool 5 is estimated with high confidence to be 12 rows (
     432 bytes).  The estimated time for this step is 0.01 seconds.
  5) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 4 (all_amps), which is duplicated on
     all AMPs.  Then we do a SORT to order Spool 4 by the hash code of
     (JSMITH.temp_codes.InventoryCode).  The size of Spool 4 is
     estimated with high confidence to be 144 rows (5,184 bytes).
  6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an
     all-rows scan, which is joined to big_edw.r in view
     our_dw.sales_detail by way of a traversal of index # 4 without
     accessing the base table extracting row ids only.  Spool 4 and
     big_edw.r are joined using a nested join, with a join condition
     of ("big_edw.r.InventoryCode = InventoryCode").  The input table
     big_edw.r will not be cached in memory.  The result goes into
     Spool 6 (all_amps), which is built locally on the AMPs.  Then we
     do a SORT to order Spool 6 by field Id 1.  The size of Spool 6 is
     estimated with low confidence to be 6,075 rows (182,250 bytes).
     The estimated time for this step is 0.07 seconds.
  7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to 2 partitions of big_edw.r in
     view our_dw.sales_detail with a condition of ("(big_edw.r in
     view our_dw.sales_detail.InventoryTypeID = 14) OR (big_edw.r in
     view our_dw.sales_detail.InventoryTypeID = 1)").  Spool 6 and
     big_edw.r are joined using a row id join, with a join condition
     of ("big_edw.r.InventoryCode = InventoryCode").  The input table
     big_edw.r will not be cached in memory.  The result goes into
     Spool 7 (all_amps), which is redistributed by the rowkey of (
     big_edw.r.InventoryCode) to all AMPs.  Then we do a SORT to partition
     Spool 7 by rowkey.  The size of Spool 7 is estimated with low
     confidence to be 6,075 rows (352,350 bytes).  The estimated time
     for this step is 3.20 seconds.
  8) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of a
     RowHash match scan, which is joined to a single partition of
     big_edw.m in view our_dw.sales_detail by way of a RowHash
     match scan with a condition of ("(big_edw.m in view
     our_dw.sales_detail.InventoryTypeID = 14) OR (big_edw.m in view
     our_dw.sales_detail.InventoryTypeID = 1)").  Spool 7 and big_edw.m
     are left outer joined using a rowkey-based merge join, with
     condition(s) used for non-matching on left table ("NOT (InventoryCode
     IS NULL)"), with a join condition of ("(InventoryTypeID =
     big_edw.m.InventoryTypeID) AND (InventoryCode = big_edw.m.InventoryCode)").
     The result goes into Spool 3 (all_amps), which is built locally on
     the AMPs.  The size of Spool 3 is estimated with low confidence to
     be 6,075 rows (473,850 bytes).  The estimated time for this step
     is 0.02 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     big_edw.r.CustomerID ,big_edw.r.AccountNumber
     ,big_edw.r.InventoryTypeID ,big_edw.r.StoreID
     ,big_edw.r.SalesDate ,big_edw.r.InventoryCode
     ,big_edw.m.InventoryDescription).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 10.  The size of Spool 10
     is estimated with low confidence to be 6,075 rows (1,117,800
     bytes).  The estimated time for this step is 0.03 seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 10 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with low confidence
     to be 6,075 rows (716,850 bytes).  The estimated time for this
     step is 0.02 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
     No rows are returned to the user as the result of statement 2.

 

talk2soumya 28 posts Joined 11/11
09 Dec 2015

Can u try below Query:
 

select
   r.*
 
from
   our_dw.sales_detail r
    
where
   r.InventoryCode in (select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%' group by 1);
   

 

TrackingDaily 9 posts Joined 09/12
09 Dec 2015

talk2soumya, adding "group by 1" had no effect on execution plan, in reality it added a few seconds to execution time (46 vs 52 seconds)
I do appreciate everyone's input on this.

kirthi 65 posts Joined 02/12
09 Dec 2015

Please provide the table structures, Statistics information of both the tables. 
 
Also the explain plan for below 2 SQLs

/* execution time: 151.8 seconds */

select

   r.*

   

from

   our_dw.sales_detail r inner join

      temp_codes c on r.InventoryCode = c.InventoryCode

      

   ;

 

/* execution time: 0.5 seconds AHA! */

/* why isn't Teradata smart enough to run ALL the above queries like this ?? */

select

   r.*

   

from

   our_dw.sales_detail r

   

where

   r.InventoryCode in (select InventoryCode from temp_codes)

TrackingDaily 9 posts Joined 09/12
09 Dec 2015

kirthi, the execution plan for the version with the temp table is posted in my "Thanks Todd" post.  (Admins: an essential part of a forum is the ability to uniquely identify and link to specific posts!).  I will try to post the other tomorrow.  
All, what commands should I be running to get the statistics you're requesting?

kirthi 65 posts Joined 02/12
09 Dec 2015

Pls provide the below
Show stats <table_name>
help stats <table_name>
Show table <table_name>
 
 
 
 

keithleyner 1 post Joined 12/12
10 Dec 2015

Try this:

select

   r.*

   

from

   our_dw.sales_detail r inner join

      our_dw.inventory_summary s

on r.InventoryCode = s.InventoryCode

   

And s.ItemDescription like 'Widget B%';

 

 

ToddAWalter 316 posts Joined 10/11
10 Dec 2015

- "Do steps in parallel" in the explain is a different parallelism. It just means that the optimizer has decided that it can execute those steps simultaneously rather than doing them sequentially. You are correct that the following step after the parallel blcok cannot be done until the longest of the "parallel" steps has finished.  But this has nothing to do with why the plan is built the way it is.
 
- If you look closely at the plan for your temp table case and the plan for specifying the codes as constants, you will see that the commonality for the fast plan is that the NUSI on InventoryCode is used to access the big table. In both of these fast caes, the optimizer knows that there is a very small number of codes to retrieve and thus the use of the NUSI is a great plan. In the not as fast plans, the commonality is that there is a full table join to the big table without using the index. In every one of these cases, the estimate for the result of the LIKE expression is a big number of codes (91,068) rather than 11. This large number causes the NUSI to look very expensive to the optimizer so it does the plan very differently to set up for a hash join to the large table.
 
- I did not see an answer to my question about whther there are statistics on the Description column which you are doing the LIKE upon?

TrackingDaily 9 posts Joined 09/12
10 Dec 2015

The entire inventory_summary view has about 101,000 records. For an execution plan to assume that a LIKE statement will return 91,000, or 90%, of the records is both impractical in real-world usage and highly improbable.  For example, something as wide-reaching as WHERE LastName LIKE '%e%' would not return 90% of the records in any real-world database.  It's nearly impossible to think of a LIKE statement that returns 90% of records unless the vast majority share a specific characteristic in their lettering.  Most people, if they were just excluding 10% of their records, would use a negative filter (i.e. WHERE InventoryCode NOT LIKE 'B%' or WHERE SUBSTRING (InventoryCode from 1,1) NOT = 'B'). I'm going on record as saying that this is a flaw in the execution engine, that while it is making assumptions, it should assume that a LIKE filter returns a lot less than 90% of the records in a table.  Secondly, why isn't the engine smart enough to determine that number from the start?  If it so happens that the LIKE will indeed return 91,000 records, well, we're going to be here a long time anyway so an extra 1/2 second isn't a huge deal.  But it might find, as in my query, that it will only obtain 11 records, therefore the total execution time can be reduced by about 95%!  So that's a second flaw, in my opinion -- that not only does it make a bad assumption about the results of a LIKE (flaw #1) but it believes it so much that it refuses to take a 1/2 second to confirm or repudiate the assumption.
 
Please tell me how to obtain the stats on the Description column that you're requesting, and I will try to post them.
 
 

ToddAWalter 316 posts Joined 10/11
11 Dec 2015

HELP STATISTICS <database>.<table>; will tell you the current status of statistics on the table.
 
If there are not statistics on that column:
COLLECT STATISTICS COLUMN (<database>.<table>.<column>);
 
If you do not have the rights to collecct the statistics it might be necessary to ask your DBAs to collect them.

ToddAWalter 316 posts Joined 10/11
11 Dec 2015

It is very likely that there are not statistics on the Description column. 90% is the estimate that the optimizer will use in the absence of statistics. I respect your opinion regarding this defaullt estimate but our experience has shown that LIKE expression selectivity is very widely varied. Without statistics, the optimizer is designed to use estimates that are purposely pessimistic in order to cause the optimizer to create "safe" plans. In this case, if we were to estimate a small % selectivity and use that to build a plan that accesses via the index, then if the selectivity was actually higher, the execution cost of the query would be much higher. That is what we are trying to avoid. The actual % is not that important. We would need to estimate a very small selectivity to cause the optimizer to choose the index based plan since the NUSI access is only valuable when it will do less access to the data blocks than a full scan/join.
 
Until very recently, the optimizer had no ability to partially exeecute a query to determine actuals. In 15.0 and 15.10 this type of capability has been introduced. So far it is for specific cases - eg single value subqueries. In thiis case the optimizer will not only partially execute the query but it will also take the computed values and compile them back into the plan as constants. This capability will continue to be extended to other types of subqueries and expressions over the next several releases. To date, it has not yet been applied to LIKE expressions but that is certainly a future possibility.
 
In the meantime, collecting statistics has some chance of helping. It is not guaranteed because, as noted above, the estimate needs to get quite selective before it can choose the index plan. LIKE is particularly hard to estimate accurately from the statistics histogram.

You must sign in to leave a comment.