All Forums Database
sgarlapa 88 posts Joined 03/13
19 Nov 2014
help on tuning a query

SELECT
col1,col2
FROM
staging_table STG
INNER JOIN
target_table TGT
ON
STG.SRC_TRAN_ID=TGT.SRC_TRAN_ID
WHERE
TGT.ROW_STAT_CD='Active'
AND TGT.SRC_SYS='sourcename'
In above query, both STG and TGT has PI as SRC_TRAN_ID
TGT has stats collected on ROW_STAT_CD and SRC_SYS
TGT is a huge table with 18 months of history partitioned by day interval. and stats collected on parition.
in the above query i can't use a partition condition as per business logic
In production the query is taking a long time and huge I/O. The obvious reason is no use of parition.
But is there any other way I can try tune this query ? Appriciate your help.
 
--Sri

teradata_chela2 18 posts Joined 10/13
19 Nov 2014

Have you tried creating SI on ROW_STAT_CD , .SRC_SYS columnn of TGT table .
As per my knowledge that may help reducing IO at least  .

Arint

dnoeth 4628 posts Joined 11/04
19 Nov 2014

Hi Sri,
can you show the explain?
What's the rowcount for both tables?

Dieter

sgarlapa 88 posts Joined 03/13
21 Nov 2014

I didn't try SI. as it is very huge table and loaded by multiload. Even i proposed compression on both the columns as distinct values are less but as it is a DDL change need to take a complete rock and roll mechanism on huge table which was not accepted.
Hi Dieter,
The count of TARGET is 3 billiion and stage count is around 2 million. 
Here is the explain in UAT -
We do an all-AMPs
JOIN step from APP_STG.STG by way of a RowHash
 
match scan, which is joined to APP_TGT.TRAN_ID_VW by way of a
 
RowHash match scan with a condition of (
 
"(APP_TGT.TRAN_ID_VW.ROW_STAT_CD = 'A') AND
(APP_TGT.TRAN_ID_VW.SRC_SYS_ID = 'SOURCE')"
 
). APP_STG.STG and
 
APP_TGT.TRAN_ID_VW are joined using a sliding-window merge join,
with a join condition of (
"APP_STG.STG.SRC_TRAN_ID =
APP_TGT.TRAN_ID_VW.SRC_TRAN_ID"
 
). The input table
 
APP_TGT.TRAN_ID_VW will not be cached in memory. The result goes
into
Spool 1 (group_amps), which is built locally on the AMPs.
The size of
Spool 1 is estimated with low confidence to be 29,965
rows (1,767,935 bytes). The estimated time for this step is 17.34
 
 

SmarakDas 51 posts Joined 02/12
24 Nov 2014

Hello Srilakshmi,

As per your Explain, the Query is using Sliding Window Merge Join, which is PPI aware. The process followed by the Sliding Window Merge Join ensures Product Join between the left & right tables after structuring both of them into appropriate sizes. As your table has large number of records & the interval of Partitions being 1 day only, the number of partitions is large. As such, while product joining between the left & right "virtual partitions", the movement of data from Disk to Memory (1 Complete pass through all the partitions of 1 table for each partition of the other table) is very high considering the PPICacheThrP has the default value of 10 only. 

Possible solution can be using PPI on the Where-Clause columns (ROW_STAT_CD or SRC_SYS) as you mention these columns have few distinct values. That could impact by eliminating partitions in bulk rather than rows. Also, you can check why the Joining Operation has low confidence. It can be statistics related (No collection, stale, over explosion). 

 

Thanks,

Smarak

You must sign in to leave a comment.