All Forums Database
barani_sachin 141 posts Joined 01/12
11 Apr 2013
Query Rewriting.

Hi All,
I have two versions(One is old which is being rewritten using some new standards) of code both claiming to accomplish the same thing; with the subset of data we tested both seems to give the same results. If you guys see any issues in the new code, pelase let me know.
 
Old Code:

SELECT  A.Ten_Dig_Val,A.SUP_Ten_Dig_Val
FROM    Rel_Detail A,
(SELECT Ten_Dig_Val,MAX(Rel_Detail)AS EFF_DT
FROM    Rel_Detail
WHERE   Ten_Dig_Val IN (SELECT Ten_Dig_Val FROM Ten_Dig_Val_list)
GROUP BY Ten_Dig_Val) B
WHERE   A.Ten_Dig_Val=B.Ten_Dig_Val AND
A.HIER_Rel_Detail = B.EFF_DT AND
Lvl_Nbr=(SELECT MIN(Lvl_Nbr) FROM Rel_Detail WHERE A.Ten_Dig_Val=Ten_Dig_Val AND HIER_Rel_Detail = B.EFF_DT)

 
New Approach:

SELECT  A.Ten_Dig_Val,A.SUP_Ten_Dig_Val
FROM    Rel_Detail A
inner join
(
SELECT Temp1.Ten_Dig_Val,MAX(HIER_Rel_Detail)AS EFF_DT
FROM    Rel_Detail Temp1
inner join Ten_Dig_Val_list Temp2
on Temp1.Ten_Dig_Val = Temp2.Ten_Dig_Val
GROUP BY Temp1.Ten_Dig_Val
) B
on   A.Ten_Dig_Val=B.Ten_Dig_Val
AND A.HIER_Rel_Detail = B.EFF_DT 

inner join temp3 t3
on t3.Lvl_Nbr = A.Lvl_Nbr
and t3.Ten_Dig_Val = A.Ten_Dig_Val
AND t3.HIER_Rel_Detail = B.EFF_DT

create volatile table temp3 
(Ten_Dig_Val BIGINT, Lvl_Nbr BYTEINT, HIER_Rel_Detail DATE FORMAT 'YYYY-MM-DD')on commit preserve rows;

insert into temp3
SELECT Ten_Dig_Val,Lvl_Nbr,HIER_Rel_Detail FROM Rel_Detail
qualify row_number() over(partition by Ten_Dig_Val,HIER_Rel_Detail order by Lvl_Nbr)=1

collect stats on temp3 column (Ten_Dig_Val);
collect stats on temp3 column (Lvl_Nbr);
collect stats on temp3 column (HIER_Rel_Detail);

 
Thanks in advance :)

barani_sachin 141 posts Joined 01/12
15 Apr 2013

Hi,
anyone pls confirm whether both the snippets/logics are doing the same thing??

dnoeth 4628 posts Joined 11/04
15 Apr 2013

Yes, both seem to do the same, but why do you create a Volatile Table instead if directly using it in a Derived Table?
I'm not shure but you probably want the minimum Lvl_Nbr for the maximum HIER_Rel_Detail, which seems to be much simpler: 

SELECT  A.Ten_Dig_Val,A.SUP_Ten_Dig_Val
FROM    Rel_Detail A join Ten_Dig_Val_list Temp2
on   A.Ten_Dig_Val=B.Ten_Dig_Val
AND A.HIER_Rel_Detail = B.HIER_Rel_Detail
qualify
   row_number() 
   over(partition by Ten_Dig_Val 
        order by HIER_Rel_Detail desc, Lvl_Nbr)=1

Dieter

Dieter

barani_sachin 141 posts Joined 01/12
15 Apr 2013

Thanks Dieter for u r help & time :)
    The data volume was around 30 to 40 million so thought that inserting into volatile table and collecting stats might help reduce the execution time.
   Yes Dieter, thats what the query does, thanks again for this elegant and simple solution will try this method also.
 
 

Huzain 3 posts Joined 02/13
29 May 2013

Hi Dieter,
Could you please look into my problem, I am new to this site. need your help.
We have problem with a query in which the aggregation stpes take more time to complete as given below the explain plan
We do an all-AMPs SUM step to aggregate from Spool 6 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     PDWRKCDR.A.CALL_START_DT ,PDWRKCDR.A.ACCS_METH_ID
     ,PDWRKCDR.A.ACCS_METH_VAL ,PDWRKCDR.A.CCB_SBSCRPN_ID
     ,PDWRKCDR.A.TELP_TYPE).  Aggregate Intermediate Results are
     computed globally, then placed in Spool 13.  The aggregate spool
     file will not be cached in memory.  The size of Spool 13 is
     estimated with low confidence to be 25,734,269 rows (
     41,226,298,938 bytes).  The estimated time for this step is 1 hour
     and 1 minute.
 
  8) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 4 are sent back to the user as the result of
     statement 1.  The total estimated time is 1 hour and 3 minutes.
 
The stats were stealed but it the data is not changing so , i have recollect but there come no difference in reponse time.
 
but relate to sum aggregate in select because this query have many sum aggregate and “case when” condition (around 320 condition and sql 2000+ record) and very big data (VOL_CDR_PRE is around 740 million record). 
 
How would i optimized the aggregation in order to optimized the query.
 
Thanks

dnoeth 4628 posts Joined 11/04
02 Jun 2013

Did you compare the actual runtime and CPU/IO from DQBL (preferably QryLogSteps)?
You didn't show the full SQL, but i assume there are some joins in the previous steps. Depending on the actual data/PK/FK you might try to aggregate the large table(s) before the join. Otherwise materialize the data without aggregation in a Volatile Table with a PI on the GROUP BY columns and then aggregate on this VT, thus you'll get "computed locally".
 
 
Dieter
 

Dieter

Huzain 3 posts Joined 02/13
02 Apr 2015

Spool issue with the query, there is no null values in the join condition.
stats are uptodate, but still have issue.
 
 
sel AL_EFAT_CNL_DIM.CNL_NAME,TDIM.EMAIL_TALEP_F,TDIM.BASILI_TALEP_F,TDIM.SMS_BILGI_TALEP_F,TDIM.EFATURA_POSTA_KUTUSU_F
FROM VP_SEM.CNL_DIM AL_EFAT_CNL_DIM
RIGHT JOIN VP_SEM.EVNT_CNL ECNL ON AL_EFAT_CNL_DIM.CNL_ID=ECNL.CNL_ID
RIGHT JOIN VP_SEM.ETIYA_FATURA_TALEP_DIM TDIM ON ECNL.CNL_ID = TDIM.STRT_CNL_ID
 
Explain plan:
 
1) First, we
lock DP_ETL_SKEY.CHANNEL_KEY in view VP_SEM.CNL_DIM for
 
access, we lock DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM in view
 
VP_SEM.ETIYA_FATURA_TALEP_DIM for access, we lock
 
DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL for access, and we
 
lock DP_CORE_EDW.CNL in view VP_SEM.CNL_DIM for access.
2) Next, we do an all-AMPs
RETRIEVE step from DP_CORE_EDW.CNL in view
 
VP_SEM.CNL_DIM by way of an all-rows scan with a condition of (
 
"DP_CORE_EDW.CNL in view VP_SEM.CNL_DIM.DWH_STATUS = 'AC'") into
 
Spool 4 (all_amps), which is duplicated on all AMPs. The size of
 
Spool 4 is estimated with low confidence to be 230,688 rows (
11,995,776 bytes). The estimated time for this step is 0.04
seconds.
3) We do an all-AMPs
JOIN step from Spool 4 (Last Use) by way of an
 
all-rows scan, which is joined to 10 partitions of
 
DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL with a condition of (
 
"DP_CORE_EDW.EVNT_CNL in view VP_SEM.EVNT_CNL.DWH_CLOSED_LOAD IS
NULL"
 
). Spool 4 and DP_CORE_EDW.EVNT_CNL are right outer joined
using a
dynamic hash join, with condition(s) used for non-matching
on right table (
"NOT (DP_CORE_EDW.EVNT_CNL.CNL_ID IS NULL)"), with
a join condition of (
"CNL_ID = DP_CORE_EDW.EVNT_CNL.CNL_ID"). The
result goes into
Spool 5 (all_amps), which is duplicated on all
AMPs. Then we do a SORT to order
Spool 5 by the hash code of (
 
DP_CORE_EDW.EVNT_CNL.CNL_ID). The result spool file will not be
cached in memory. The size of
Spool 5 is estimated with low
confidence to be 458,469,681,120 rows (25,674,302,142,720 bytes).
The estimated time for this step is
19 hours and 43 minutes.
4) We do an all-AMPs
RETRIEVE step from
 
DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM in view
 
VP_SEM.ETIYA_FATURA_TALEP_DIM by way of an all-rows scan with no
residual conditions into
Spool 7 (all_amps), which is built
 
locally on the AMPs. Then we do a SORT to order Spool 7 by the
hash code of (
DP_CORE_SEM.ETIYA_FATURA_TALEP_DIM.STRT_CNL_ID).
The size of
Spool 7 is estimated with high confidence to be
12,691,911 rows (469,600,707 bytes). The estimated time for this
step is 0.86 seconds.
5) We do an all-AMPs
JOIN step from Spool 5 (Last Use) by way of a
 
RowHash match scan, which is joined to Spool 7 (Last Use) by way
of a
RowHash match scan. Spool 5 and Spool 7 are right outer
joined using a
merge join, with condition(s) used for non-matching
on right table (
"NOT (STRT_CNL_ID IS NULL)"), with a join
condition of (
"CNL_ID = STRT_CNL_ID"). The result goes into Spool
3 (all_amps), which is
built locally on the AMPs. The result
spool file will not be cached in memory. The size of
Spool 3 is
estimated with low confidence to be 128,621,109,816,881 rows (***
bytes). The estimated time for this step is 1,
111 hours and 49
minutes.
6) We do an all-AMPs SAMPLING step from
Spool 3 (Last Use) by way of
an
all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. Samples are specified as a number of rows.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of
Spool 1 are sent back to the user as the result of
 

dnoeth 4628 posts Joined 11/04
02 Apr 2015

Stats might be up-to-date, but at least one on VP_SEM.CNL_DIM.DWH_STATUS is missing, DIAGNOSTIC HELPSTATS ON FOR SESSION; will reveal this.
The estimates are horrible, what's the actual row counts for those tables?
Seems the optimizer has no information about uniqueness or the join is not correctly using the logical PKs/FKs. 
Can you provide more details, DDL/PI, stats?

Dieter

You must sign in to leave a comment.