Can you add some more details, (P)PI & statistics of both tables plus some Explains?

Dieter

I am not the DBAdmin, just the primary user, I don't have access to detailed stats. But "inventory_summary" has 101,000 rows, "sales_detail" has 30.5 million. Both of the above are views, not direct tables. InventoryCode is a varchar(15).

I can see in the Explains that in the 1st query (with hard-coded InventoryCodes), Teradata goes straight to "sales_detail" and very quickly pulls data back one-by-one based on each InventoryCode (therefore, 19 steps). In the 2nd query above (just obtaining the InventoryCodes, returns in 0.1 seconds), Teradata locks and reads from inventory_summary -- and figures out that it only needs to look in a single partition, and a single InventoryTypeID, and again is very fast. But the 3rd query above (42 seconds), Teradata doesn't appear to look at that view (inventory_summary is never locked for read) -- in fact it seems to work backwards, as Step 4 Part 1 has it retrieving from "sales_detail" before it even knows what InventoryCode's we will eventually be looking for. Is there some way to tell Teradata to slow down, to do the first 2 steps one at a time and things will turn out much quicker?

1st query above:

1) First, we lock a distinct edw_tables."pseudo table" for read on a RowHash to prevent global deadlock for edw_tables.r. 2) Next, we lock a distinct edw_tables."pseudo table" for read on a RowHash to prevent global deadlock for edw_tables.m. 3) We lock edw_tables.r in view our_dw.sales_detail for read, and we lock edw_tables.m in view our_dw.sales_detail for read. 4) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KID00'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 196 rows. The estimated time for this step is 0.01 seconds. 5) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIE01'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 392 rows. The estimated time for this step is 0.01 seconds. 6) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KID01'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 588 rows. The estimated time for this step is 0.01 seconds. 7) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIB01'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 784 rows. The estimated time for this step is 0.01 seconds. 8) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIA00'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 980 rows. The estimated time for this step is 0.01 seconds. 9) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIB02'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 1,176 rows. The estimated time for this step is 0.01 seconds. 10) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIC00'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 1,372 rows. The estimated time for this step is 0.01 seconds. 11) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIA01'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 1,568 rows. The estimated time for this step is 0.01 seconds. 12) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIB00'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 1,764 rows. The estimated time for this step is 0.01 seconds. 13) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIC01'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. The size of Spool 4 is estimated with high confidence to be 1,960 rows. The estimated time for this step is 0.01 seconds. 14) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of index # 4 "edw_tables.r in view our_dw.sales_detail.InventoryCode = '15KIE00'" extracting row ids only with no residual conditions into Spool 4 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 4 by row id eliminating duplicate rows. The size of Spool 4 is estimated with high confidence to be 2,156 rows. The estimated time for this step is 0.01 seconds. 15) We do an all-AMPs RETRIEVE step from edw_tables.r in view our_dw.sales_detail by way of row ids from Spool 4 (Last Use) with a residual condition of ("((edw_tables.r in view our_dw.sales_detail.InventoryTypeID = 14) OR (edw_tables.r in view our_dw.sales_detail.InventoryTypeID = 1 )) AND (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KID00') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIE01') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KID01') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIB01') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIA00') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIB02') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIC00') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIA01') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIB00') OR (((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIC01') OR ((edw_tables.r in view our_dw.sales_detail.InventoryCode )= '15KIE00')))))))))))") into Spool 5 (all_amps), which is redistributed by the rowkey of (edw_tables.r.InventoryCode) to all AMPs. Then we do a SORT to partition Spool 5 by rowkey. The size of Spool 5 is estimated with low confidence to be 1,895 rows ( 109,910 bytes). The estimated time for this step is 1.16 seconds. 16) We do an all-AMPs JOIN step from Spool 5 (Last Use) by way of a RowHash match scan, which is joined to a single partition of edw_tables.m in view our_dw.sales_detail by way of a RowHash match scan with a condition of ("(edw_tables.m in view our_dw.sales_detail.InventoryTypeID = 14) OR (edw_tables.m in view our_dw.sales_detail.InventoryTypeID = 1)"). Spool 5 and edw_tables.m are left outer joined using a rowkey-based merge join, with a join condition of ("(InventoryTypeID = edw_tables.m.InventoryTypeID) AND (InventoryCode = edw_tables.m.InventoryCode)"). The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with low confidence to be 1,895 rows ( 147,810 bytes). The estimated time for this step is 0.01 seconds. 17) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan , grouping by field1 ( edw_tables.r.CustomerID ,edw_tables.r.AccountNumber ,edw_tables.r.InventoryTypeID ,edw_tables.r.StoreID ,edw_tables.r.SalesDate ,edw_tables.r.InventoryCode ,edw_tables.m.InventoryDescription). Aggregate Intermediate Results are computed globally, then placed in Spool 8. The size of Spool 8 is estimated with low confidence to be 1,895 rows (348,680 bytes). The estimated time for this step is 0.03 seconds. 18) We do an all-AMPs RETRIEVE step from Spool 8 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 1,895 rows (223,610 bytes). The estimated time for this step is 0.01 seconds. 19) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. No rows are returned to the user as the result of statement 2. The total estimated time is 1.29 seconds.

2nd query above:

1) First, we lock a distinct edw_tables."pseudo table" for read on a RowHash to prevent global deadlock for edw_tables.sales_master. 2) Next, we lock edw_tables.sales_master in view our_dw.inventory_summary for read. 3) We do an all-AMPs RETRIEVE step from a single partition of edw_tables.sales_master in view our_dw.inventory_summary with a condition of ("edw_tables.sales_master in view our_dw.inventory_summary.InventoryTypeID = 1") with a residual condition of ("(edw_tables.sales_master in view our_dw.inventory_summary.InventoryDescription LIKE 'Widget B%') AND (edw_tables.sales_master in view our_dw.inventory_summary.InventoryTypeID = 1)") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 91,068 rows (2,549,904 bytes). The estimated time for this step is 0.04 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. No rows are returned to the user as the result of statement 2. The total estimated time is 0.04 seconds.

3rd query above:

1) First, we lock a distinct edw_tables."pseudo table" for read on a RowHash to prevent global deadlock for edw_tables.r. 2) Next, we lock a distinct edw_tables."pseudo table" for read on a RowHash to prevent global deadlock for edw_tables.sales_master. 3) We lock edw_tables.r in view our_dw.sales_detail for read, and we lock edw_tables.sales_master for read. 4) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from 2 partitions of edw_tables.r in view our_dw.sales_detail with a condition of ("(edw_tables.r in view our_dw.sales_detail.InventoryTypeID = 14) OR (edw_tables.r in view our_dw.sales_detail.InventoryTypeID = 1)") into Spool 4 (all_amps) fanned out into 4 hash join partitions, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The size of Spool 4 is estimated with low confidence to be 48,374,730 rows (2,805,734,340 bytes). The estimated time for this step is 41.26 seconds. 2) We do an all-AMPs RETRIEVE step from a single partition of edw_tables.sales_master with a condition of ( "edw_tables.sales_master.InventoryTypeID = 1") with a residual condition of ("(edw_tables.sales_master.InventoryDescription LIKE 'Widget B%') AND (edw_tables.sales_master.InventoryTypeID = 1)") into Spool 6 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 6 by the sort key in spool field1 (edw_tables.sales_master.InventoryCode) eliminating duplicate rows. The size of Spool 6 is estimated with no confidence to be 91,068 rows (3,278,448 bytes). The estimated time for this step is 0.21 seconds. 5) We execute the following steps in parallel. 1) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of an all-rows scan into Spool 5 (all_amps) fanned out into 4 hash join partitions, which is duplicated on all AMPs. The size of Spool 5 is estimated with no confidence to be 1,092,816 rows (39,341,376 bytes). 2) We do an all-AMPs RETRIEVE step from a single partition of edw_tables.m in view our_dw.sales_detail with a condition of ("(edw_tables.m in view our_dw.sales_detail.InventoryTypeID = 14) OR (edw_tables.m in view our_dw.sales_detail.InventoryTypeID = 1)") into Spool 7 (all_amps), which is redistributed by the hash code of (edw_tables.m.InventoryTypeID, edw_tables.m.InventoryCode) to all AMPs. The size of Spool 7 is estimated with low confidence to be 101,186 rows (4,249,812 bytes). The estimated time for this step is 0.11 seconds. 6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an all-rows scan, which is joined to Spool 5 (Last Use) by way of an all-rows scan. Spool 4 and Spool 5 are joined using a inclusion hash join of 4 partitions, with a join condition of ("InventoryCode = InventoryCode"). The result goes into Spool 8 (all_amps), which is redistributed by the hash code of (edw_tables.r.InventoryTypeID, edw_tables.r.InventoryCode) to all AMPs. The size of Spool 8 is estimated with no confidence to be 46,103,186 rows (2,673,984,788 bytes). The estimated time for this step is 48.87 seconds. 7) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of an all-rows scan, which is joined to Spool 8 (Last Use) by way of an all-rows scan. Spool 7 and Spool 8 are right outer joined using a single partition hash join, with condition(s) used for non-matching on right table ("NOT (InventoryCode IS NULL)"), with a join condition of ("(InventoryTypeID = InventoryTypeID) AND (InventoryCode = InventoryCode)"). The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with no confidence to be 46,103,187 rows (3,596,048,586 bytes). The estimated time for this step is 31.12 seconds. 8) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan , grouping by field1 ( edw_tables.r.CustomerID ,edw_tables.r.AccountNumber ,edw_tables.r.InventoryTypeID ,edw_tables.r.StoreID ,edw_tables.r.SalesDate ,edw_tables.r.InventoryCode ,edw_tables.m.InventoryDescription). Aggregate Intermediate Results are computed globally, then placed in Spool 11. The aggregate spool file will not be cached in memory. The size of Spool 11 is estimated with no confidence to be 34,577,391 rows (6,362,239,944 bytes). The estimated time for this step is 4 minutes and 39 seconds. 9) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 34,577,391 rows (4,080,132,138 bytes). The estimated time for this step is 25.61 seconds. 10) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. No rows are returned to the user as the result of statement 2.

If you look closely at the step that implements edw_tables.sales_master.InventoryDescription LIKE `'Widget B%', you will see an estimate of "`

**with** **no **`confidence `

**to**`be 91,068 `

". When it has that large estimate, the optimizer builds a plan that is very "safe" in that it will work for any size of a return set without causing really bad performance if it is underestimated. Unfortunately, the safe plan misses the fast plan when the estimate is way over-estimated. **rows**

`Are there statistics on InventoryDescription? That would give the optimizer some chance of estimating the number of rows selected by the LIKE condition. However, even that is not a guarantee since LIKE is very hard to estimate properly. `

`When you place the results in the temp table, we get to count the resulting rows before making the rest of the plan so we can use the fast plan when we know only a few rows have been selected. `

`In addition to collecting stats on InventoryDescription, can you provide the explain plan when you use the temp table?`

Thanks Todd, that is interesting. Note that the step you mention (step 4 part 2) is actually the same as query 2, step 3, and that takes a more efficient path (0.04 seconds vs 0.21 seconds). I can live with a few extra milliseconds but it is interesting that one plan is 5x faster.

What seems to me is that Teradata is getting fancy with it's concept of "parallelism" and using the opportunity to retrieve from the final, huge table (sales_detail) "in parallel" with the "LIKE 'Widget B%'" subquery (i.e. Step 4 parts 1 and 2). This may work conceptually but is horrible in reality -- a 0.21 second query and a 41 second query are not, practically speaking, processing "in parallel" -- when one part is finished the other still has 99+% of the way to go.

Finally, all the steps after #5 just seem like an attempt to fix an execution path gone awry... steps 6 and 7 require JOINs, step 8 does an aggregate sum, all with longer and longer execution time estimates (in reality, they are overstated, the entire query takes 40-60 seconds, but still, way too long).

Below is the execution plan after creating a temp table (and using the IN clause). You're correct that in this case, Teradata recognizes that the row counts will be much smaller.

1) First, we lock a distinct big_edw."pseudo table" for read on a RowHash to prevent global deadlock for big_edw.r. 2) Next, we lock a distinct big_edw."pseudo table" for read on a RowHash to prevent global deadlock for big_edw.m. 3) We lock big_edw.r in view our_dw.sales_detail for read, and we lock big_edw.m in view our_dw.sales_detail for read. 4) We do an all-AMPs RETRIEVE step from JSMITH.temp_codes by way of an all-rows scan with no residual conditions into Spool 5 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 5 by the sort key in spool field1 ( JSMITH.temp_codes.InventoryCode) eliminating duplicate rows. The size of Spool 5 is estimated with high confidence to be 12 rows ( 432 bytes). The estimated time for this step is 0.01 seconds. 5) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of an all-rows scan into Spool 4 (all_amps), which is duplicated on all AMPs. Then we do a SORT to order Spool 4 by the hash code of (JSMITH.temp_codes.InventoryCode). The size of Spool 4 is estimated with high confidence to be 144 rows (5,184 bytes). 6) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an all-rows scan, which is joined to big_edw.r in view our_dw.sales_detail by way of a traversal of index # 4 without accessing the base table extracting row ids only. Spool 4 and big_edw.r are joined using a nested join, with a join condition of ("big_edw.r.InventoryCode = InventoryCode"). The input table big_edw.r will not be cached in memory. The result goes into Spool 6 (all_amps), which is built locally on the AMPs. Then we do a SORT to order Spool 6 by field Id 1. The size of Spool 6 is estimated with low confidence to be 6,075 rows (182,250 bytes). The estimated time for this step is 0.07 seconds. 7) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an all-rows scan, which is joined to 2 partitions of big_edw.r in view our_dw.sales_detail with a condition of ("(big_edw.r in view our_dw.sales_detail.InventoryTypeID = 14) OR (big_edw.r in view our_dw.sales_detail.InventoryTypeID = 1)"). Spool 6 and big_edw.r are joined using a row id join, with a join condition of ("big_edw.r.InventoryCode = InventoryCode"). The input table big_edw.r will not be cached in memory. The result goes into Spool 7 (all_amps), which is redistributed by the rowkey of ( big_edw.r.InventoryCode) to all AMPs. Then we do a SORT to partition Spool 7 by rowkey. The size of Spool 7 is estimated with low confidence to be 6,075 rows (352,350 bytes). The estimated time for this step is 3.20 seconds. 8) We do an all-AMPs JOIN step from Spool 7 (Last Use) by way of a RowHash match scan, which is joined to a single partition of big_edw.m in view our_dw.sales_detail by way of a RowHash match scan with a condition of ("(big_edw.m in view our_dw.sales_detail.InventoryTypeID = 14) OR (big_edw.m in view our_dw.sales_detail.InventoryTypeID = 1)"). Spool 7 and big_edw.m are left outer joined using a rowkey-based merge join, with condition(s) used for non-matching on left table ("NOT (InventoryCode IS NULL)"), with a join condition of ("(InventoryTypeID = big_edw.m.InventoryTypeID) AND (InventoryCode = big_edw.m.InventoryCode)"). The result goes into Spool 3 (all_amps), which is built locally on the AMPs. The size of Spool 3 is estimated with low confidence to be 6,075 rows (473,850 bytes). The estimated time for this step is 0.02 seconds. 9) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by way of an all-rows scan , grouping by field1 ( big_edw.r.CustomerID ,big_edw.r.AccountNumber ,big_edw.r.InventoryTypeID ,big_edw.r.StoreID ,big_edw.r.SalesDate ,big_edw.r.InventoryCode ,big_edw.m.InventoryDescription). Aggregate Intermediate Results are computed globally, then placed in Spool 10. The size of Spool 10 is estimated with low confidence to be 6,075 rows (1,117,800 bytes). The estimated time for this step is 0.03 seconds. 10) We do an all-AMPs RETRIEVE step from Spool 10 (Last Use) by way of an all-rows scan into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 6,075 rows (716,850 bytes). The estimated time for this step is 0.02 seconds. 11) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. No rows are returned to the user as the result of statement 2.

Can u try below Query:

select r.* from our_dw.sales_detail r where r.InventoryCode in (select InventoryCode from our_dw.inventory_summary where ItemDescription like 'Widget B%' group by 1);

talk2soumya, adding "group by 1" had no effect on execution plan, in reality it added a few seconds to execution time (46 vs 52 seconds)

I do appreciate everyone's input on this.

Please provide the table structures, Statistics information of both the tables.

Also the explain plan for below 2 SQLs

`/* execution `

`time`

`: 151.8 seconds */`

`select`

` `

`r.*`

` `

`from`

` `

`our_dw.sales_detail r `

`inner`

`join`

` `

`temp_codes c `

`on`

`r.InventoryCode = c.InventoryCode`

` `

` `

`;`

`/* execution `

`time`

`: 0.5 seconds AHA! */`

`/* why isn't Teradata smart enough `

`to`

`run `

`ALL`

`the above queries `

`like`

`this ?? */`

`select`

` `

`r.*`

` `

`from`

` `

`our_dw.sales_detail r `

` `

`where`

` `

`r.InventoryCode `

`in`

`(`

`select`

`InventoryCode `

`from`

`temp_codes)`

kirthi, the execution plan for the version with the temp table is posted in my "Thanks Todd" post. (Admins: an essential part of a forum is the ability to uniquely identify and link to specific posts!). I will try to post the other tomorrow.

All, what commands should I be running to get the statistics you're requesting?

Pls provide the below

Show stats <table_name>

help stats <table_name>

Show table <table_name>

Try this:

`select`

` `

`r.*`

` `

`from`

` `

`our_dw.sales_detail r `

`inner`

`join`

` `

`our_dw.inventory_summary s `

`on`

`r.InventoryCode = s.InventoryCode`

` `

`And `

`s.ItemDescription `

`like`

`'Widget B%'`

;

- "Do steps in parallel" in the explain is a different parallelism. It just means that the optimizer has decided that it can execute those steps simultaneously rather than doing them sequentially. You are correct that the following step after the parallel blcok cannot be done until the longest of the "parallel" steps has finished. But this has nothing to do with why the plan is built the way it is.

- If you look closely at the plan for your temp table case and the plan for specifying the codes as constants, you will see that the commonality for the fast plan is that the NUSI on InventoryCode is used to access the big table. In both of these fast caes, the optimizer knows that there is a very small number of codes to retrieve and thus the use of the NUSI is a great plan. In the not as fast plans, the commonality is that there is a full table join to the big table without using the index. In every one of these cases, the estimate for the result of the LIKE expression is a big number of codes (91,068) rather than 11. This large number causes the NUSI to look very expensive to the optimizer so it does the plan very differently to set up for a hash join to the large table.

- I did not see an answer to my question about whther there are statistics on the Description column which you are doing the LIKE upon?

The entire inventory_summary view has about 101,000 records. For an execution plan to assume that a LIKE statement will return 91,000, or 90%, of the records is both impractical in real-world usage and highly improbable. For example, something as wide-reaching as WHERE LastName LIKE '%e%' would not return 90% of the records in any real-world database. It's nearly impossible to think of a LIKE statement that returns 90% of records unless the vast majority share a specific characteristic in their lettering. Most people, if they were just excluding 10% of their records, would use a negative filter (i.e. WHERE InventoryCode NOT LIKE 'B%' or WHERE SUBSTRING (InventoryCode from 1,1) NOT = 'B'). I'm going on record as saying that this is a flaw in the execution engine, that while it is making assumptions, it should assume that a LIKE filter returns a lot less than 90% of the records in a table. Secondly, why isn't the engine smart enough to determine that number from the start? If it so happens that the LIKE will indeed return 91,000 records, well, we're going to be here a long time anyway so an extra 1/2 second isn't a huge deal. But it might find, as in my query, that it will only obtain 11 records, therefore the total execution time can be reduced by about 95%! So that's a second flaw, in my opinion -- that not only does it make a bad assumption about the results of a LIKE (flaw #1) but it believes it so much that it refuses to take a 1/2 second to confirm or repudiate the assumption.

Please tell me how to obtain the stats on the Description column that you're requesting, and I will try to post them.

HELP STATISTICS <database>.<table>; will tell you the current status of statistics on the table.

If there are not statistics on that column:

COLLECT STATISTICS COLUMN (<database>.<table>.<column>);

If you do not have the rights to collecct the statistics it might be necessary to ask your DBAs to collect them.

It is very likely that there are not statistics on the Description column. 90% is the estimate that the optimizer will use in the absence of statistics. I respect your opinion regarding this defaullt estimate but our experience has shown that LIKE expression selectivity is very widely varied. Without statistics, the optimizer is designed to use estimates that are purposely pessimistic in order to cause the optimizer to create "safe" plans. In this case, if we were to estimate a small % selectivity and use that to build a plan that accesses via the index, then if the selectivity was actually higher, the execution cost of the query would be much higher. That is what we are trying to avoid. The actual % is not that important. We would need to estimate a very small selectivity to cause the optimizer to choose the index based plan since the NUSI access is only valuable when it will do less access to the data blocks than a full scan/join.

Until very recently, the optimizer had no ability to partially exeecute a query to determine actuals. In 15.0 and 15.10 this type of capability has been introduced. So far it is for specific cases - eg single value subqueries. In thiis case the optimizer will not only partially execute the query but it will also take the computed values and compile them back into the plan as constants. This capability will continue to be extended to other types of subqueries and expressions over the next several releases. To date, it has not yet been applied to LIKE expressions but that is certainly a future possibility.

In the meantime, collecting statistics has some chance of helping. It is not guaranteed because, as noted above, the estimate needs to get quite selective before it can choose the index plan. LIKE is particularly hard to estimate accurately from the statistics histogram.

I am trying to replicate the following query, which takes under half a second, in a more dynamic way to expand it's usage:

Fortunately, the query that returns those InventoryCodes itself takes just 0.1 second to run:

Therefore, I figured the combination of a 0.5 second query and an 0.1 second query would still be a very fast query. Unfortunately, the following takes 42 seconds:

and every other combination (JOINs, CTE's) takes even longer:

Finally, with a temp table, I was able to get the performance I expected, but only if we avoid a JOIN:

Every one of these queries returns the identical dataset. Why isn't Teradata smart enough to first get the list of InventoryCodes, then execute the rest of the query as if it had the hard-coded list to begin with (i.e. just like it does in the original sample) ? For the database to take so long must mean it is doing table scans and lookups that aren't necessary. Can I provide "hints" to help Teradata understand what is going on and to speed things up?

We are on v.14.10.05.04 and using JDBC driver v.15.10.00.09 (I get similar results via SQL Assistant also)

Thank you.