All Forums Database
Zolo000 1 post Joined 03/16
02 Mar 2016
Why Predicate Push Down mechanism doesn`t happen?

Hello!

I`m interested in Predicate Push Down mechanism in Teradata.
Could you possibly explain why it doesn`t happen in example below and give a piece of advice how to fix it?
Thanks in advance.
Description of the example:
UAT_DM_CF.DM_CF_CARD_TURN - table with customer's salary information month-by-month. Primary Index: AGREEMENT_RK. Primary Key: AGREEMENT_RK, YEAR_MONTH. CUSTOMER_MDM_ID - customer identifier, that changes over time.

UAT_DM_CF.TECH_MDM_RELATIONSHIP - table with history of all relationships between "old" and "new" customer identifiers. This table are used to sync customer identifiers in UAT_DM_CF.DM_CF_CARD_TURN. Primary Index: PREVIOUS_CUSTOMER_ID. Primary Key: PREVIOUS_CUSTOMER_ID, EFFECTIVE_TO_DTTM.

UAT_VDM_CF.TECH_MDM_RELATIONSHIP_ACT - view with actual relationship of "old" and "new" customer identifiers (view).
It has the following structure:

REPLACE VIEW UAT_VDM_CF.TECH_MDM_RELATIONSHIP_ACT AS LOCKING ROW FOR ACCESS
   SELECT
        PREVIOUS_CUSTOMER_RK,
        PREVIOUS_CUSTOMER_ID,
        CUSTOMER_RK,
        CUSTOMER_ID
   FROM
        UAT_DM_CF.TECH_MDM_RELATIONSHIP
   WHERE
        EFFECTIVE_TO_DTTM = CAST('5999-12-31 00:00:00' AS TIMESTAMP(0))
        AND DELETED_FLG = '0';

UAT_VDM_CF.DM_CF_CARD_TURN - view with synchronized customer identifiers.
It has the following structure:

REPLACE VIEW UAT_VDM_CF.DM_CF_CARD_TURN AS LOCKING ROW FOR ACCESS
   SELECT
        crdt.AGREEMENT_RK,
        crdt.CONTRACT_ID,
        crdt.YEAR_MONTH,
        COALESCE(rel.CUSTOMER_ID, crdt.CUSTOMER_MDM_ID) as CUSTOMER_MDM_ID,
        crdt.CUSTOMER_MDM_ID as OLD_CUSTOMER_MDM_ID,
        crdt.CURRENCY_ISO_ID,
        crdt.SALARY_FLG,
        crdt.DEBET_TURN_AMT,
        crdt.DEBET_TURN_SALARY_AMT,
        crdt.CREDIT_TURN_AMT,
        crdt.BALANCE_AMT,
        crdt.EMPLOYER_TAX_PAYER_NUM,
        crdt.SOURCE_SYSTEM_CD,
        crdt.BUSINESS_DTTM,
        crdt.PROCESSED_DTTM,
        crdt.LAYER_ID,
        crdt.LOAD_ID
   FROM
        UAT_DM_CF.DM_CF_CARD_TURN as crdt
   LEFT JOIN
        UAT_VDM_CF.TECH_MDM_RELATIONSHIP_ACT as rel
   ON
        crdt.CUSTOMER_MDM_ID = rel.PREVIOUS_CUSTOMER_ID;

After executing two following queries we surprisingly got dramatic difference in response time:

select * from UAT_DM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123'; --0.02 seconds
select * from UAT_VDM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123'; --17 minutes and 42 seconds

The question is why in the second query the condition CUSTOMER_MDM_ID = '123' applies after dynamic hash join ?
Explain Texts:

Explain select * from UAT_DM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123';

  1) First, we lock a distinct UAT_DM_CF."pseudo table" for read on a
     RowHash to prevent global deadlock for UAT_DM_CF.DM_CF_CARD_TURN.
  2) Next, we lock UAT_DM_CF.DM_CF_CARD_TURN for read.
  3) We do an all-AMPs RETRIEVE step from UAT_DM_CF.DM_CF_CARD_TURN by
     way of index # 4 "UAT_DM_CF.DM_CF_CARD_TURN.CUSTOMER_MDM_ID =
     '123'" with a residual condition of (
     "UAT_DM_CF.DM_CF_CARD_TURN.CUSTOMER_MDM_ID = '123'") into Spool 1
     (group_amps), which is built locally on the AMPs.  The size of
     Spool 1 is estimated with high confidence to be 43 rows (19,307
     bytes).  The estimated time for this step is 0.02 seconds.
  4) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
     No rows are returned to the user as the result of statement 2.
     The total estimated time is 0.02 seconds.
Explain select * from UAT_VDM_CF.DM_CF_CARD_TURN where CUSTOMER_MDM_ID = '123';

  1) First, we lock UAT_DM_CF.TECH_MDM_RELATIONSHIP in view
     UAT_VDM_CF.DM_CF_CARD_TURN for access, and we lock UAT_DM_CF.crdt
     in view UAT_VDM_CF.DM_CF_CARD_TURN for access.
  2) Next, we do an all-AMPs RETRIEVE step from a single partition of
     UAT_DM_CF.TECH_MDM_RELATIONSHIP in view UAT_VDM_CF.DM_CF_CARD_TURN
     with a condition of ("UAT_DM_CF.TECH_MDM_RELATIONSHIP in view
     UAT_VDM_CF.DM_CF_CARD_TURN.EFFECTIVE_TO_DTTM = TIMESTAMP
     '5999-12-31 00:00:00'") with a residual condition of (
     "(UAT_DM_CF.TECH_MDM_RELATIONSHIP.EFFECTIVE_TO_DTTM = TIMESTAMP
     '5999-12-31 00:00:00') AND (UAT_DM_CF.TECH_MDM_RELATIONSHIP in
     view UAT_VDM_CF.DM_CF_CARD_TURN.DELETED_FLG = '0')") into Spool 2
     (all_amps) (compressed columns allowed), which is duplicated on
     all AMPs.  The size of Spool 2 is estimated with high confidence
     to be 1,256,112 rows (199,721,808 bytes).  The estimated time for
     this step is 0.15 seconds.
  3) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
     all-rows scan, which is joined to UAT_DM_CF.crdt in view
     UAT_VDM_CF.DM_CF_CARD_TURN by way of an all-rows scan with no
     residual conditions.  Spool 2 and UAT_DM_CF.crdt are right outer
     joined using a dynamic hash join, with condition(s) used for
     non-matching on right table ("NOT (UAT_DM_CF.crdt.CUSTOMER_MDM_ID
     IS NULL)"), with a join condition of (
     "UAT_DM_CF.crdt.CUSTOMER_MDM_ID = PREVIOUS_CUSTOMER_ID").  The
     input table UAT_DM_CF.crdt will not be cached in memory.  The
     result goes into Spool 3 (all_amps) (compressed columns allowed),
     which is built locally on the AMPs.  The result spool file will
     not be cached in memory.  The size of Spool 3 is estimated with
     low confidence to be 1,207,507,023 rows (527,680,569,051 bytes).
     The estimated time for this step is 7 minutes and 20 seconds.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan with a condition of ("(( CASE WHEN (NOT
     (CUSTOMER_ID IS NULL )) THEN (CUSTOMER_ID) ELSE (CUSTOMER_MDM_ID)
     END ))= '123'") into Spool 1 (group_amps), which is built locally
     on the AMPs.  The result spool file will not be cached in memory.
     The size of Spool 1 is estimated with low confidence to be
     1,207,507,023 rows (624,281,130,891 bytes).  The estimated time
     for this step is 10 minutes and 21 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.
     No rows are returned to the user as the result of statement 2.
     The total estimated time is 17 minutes and 42 seconds.

The tables have the following structure:

show table UAT_DM_CF.DM_CF_CARD_TURN;

CREATE MULTISET TABLE UAT_DM_CF.DM_CF_CARD_TURN ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      CUSTOMER_MDM_ID VARCHAR(100) CHARACTER SET UNICODE CASESPECIFIC,
      CONTRACT_ID VARCHAR(50) CHARACTER SET UNICODE CASESPECIFIC,
      SOURCE_SYSTEM_CD VARCHAR(10) CHARACTER SET UNICODE CASESPECIFIC COMPRESS ('00006','00040','00018','00051','00000'),
      YEAR_MONTH DATE FORMAT 'yyyy-mm-dd',
      SALARY_FLG CHAR(1) CHARACTER SET UNICODE CASESPECIFIC COMPRESS ('0','1'),
      CURRENCY_ISO_ID VARCHAR(3) CHARACTER SET UNICODE CASESPECIFIC COMPRESS '810',
      DEBET_TURN_SALARY_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
      CREDIT_TURN_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
      DEBET_TURN_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
      BALANCE_AMT DECIMAL(23,5) COMPRESS 0.00000 ,
      BUSINESS_DTTM TIMESTAMP(0) NOT NULL,
      PROCESSED_DTTM TIMESTAMP(0) NOT NULL,
      AGREEMENT_RK DECIMAL(18,0) NOT NULL,
      LAYER_ID INTEGER NOT NULL,
      LOAD_ID INTEGER NOT NULL,
      EMPLOYER_TAX_PAYER_NUM VARCHAR(200) CHARACTER SET UNICODE CASESPECIFIC COMPRESS )
PRIMARY INDEX ( AGREEMENT_RK )
PARTITION BY RANGE_N(YEAR_MONTH  BETWEEN DATE '2010-11-01' AND DATE '2020-12-01' EACH INTERVAL '1' MONTH ,
 NO RANGE)
INDEX ( CUSTOMER_MDM_ID );
show table UAT_DM_CF.TECH_MDM_RELATIONSHIP;

CREATE MULTISET TABLE UAT_DM_CF.TECH_MDM_RELATIONSHIP ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      PREVIOUS_CUSTOMER_RK INTEGER NOT NULL,
      PREVIOUS_CUSTOMER_ID VARCHAR(100) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
      PREVIOUS_CUSTOMER_TYPE_CD CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
      CUSTOMER_RK INTEGER NOT NULL,
      CUSTOMER_ID VARCHAR(100) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
      CUSTOMER_TYPE_CD CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
      SOURCE_SYSTEM_CD VARCHAR(10) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
      FILE_ID INTEGER NOT NULL,
      PROCESSED_DTTM TIMESTAMP(0) NOT NULL,
      EFFECTIVE_FROM_DTTM TIMESTAMP(0) NOT NULL,
      EFFECTIVE_TO_DTTM TIMESTAMP(0) NOT NULL,
      LOAD_ID INTEGER NOT NULL,
      DELETED_FLG CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL,
      IS_ACTIVE_FLG CHAR(1) CHARACTER SET UNICODE CASESPECIFIC NOT NULL)
PRIMARY INDEX ( PREVIOUS_CUSTOMER_ID )
PARTITION BY RANGE_N(CAST((EFFECTIVE_TO_DTTM ) AS DATE AT TIME ZONE INTERVAL '3:00' HOUR TO MINUTE ) BETWEEN DATE '2010-01-01' AND DATE '2017-12-31' EACH INTERVAL '3' DAY ,
DATE '5999-12-31' AND DATE '5999-12-31' EACH INTERVAL '1' DAY ,
 NO RANGE)
INDEX ( PREVIOUS_CUSTOMER_RK );

The tables have the following statistics:

show statistics on UAT_DM_CF.DM_CF_CARD_TURN;

COLLECT STATISTICS
                   -- default SYSTEM SAMPLE PERCENT
                   -- default THRESHOLD 10 DAYS
                   -- default THRESHOLD 10.00 PERCENT
            COLUMN ( AGREEMENT_RK ) ,
            COLUMN ( CUSTOMER_MDM_ID )
                ON UAT_DM_CF.DM_CF_CARD_TURN ;

    Date        Time            Unique Value        Column Names        Column Dictionary Name        Column SQL Names    Column Names UEscape
1    16/03/01    14:54:56           1,207,507,022     *                    *                            "*"                 NULL
2    16/02/19    11:27:39              37,533,183     AGREEMENT_RK        AGREEMENT_RK                 AGREEMENT_RK         NULL
3    16/03/01    14:54:56              18,000,808     CUSTOMER_MDM_ID     CUSTOMER_MDM_ID              CUSTOMER_MDM_ID      NULL
show statistics on UAT_DM_CF.TECH_MDM_RELATIONSHIP;

COLLECT STATISTICS
                   -- default SYSTEM SAMPLE PERCENT
                   -- default THRESHOLD 10 DAYS
                   -- default THRESHOLD 10.00 PERCENT
            COLUMN ( PREVIOUS_CUSTOMER_ID ) ,
            COLUMN ( CUSTOMER_ID ) ,
            COLUMN ( CUSTOMER_RK ) ,
            COLUMN ( PREVIOUS_CUSTOMER_RK ) ,
            COLUMN ( EFFECTIVE_TO_DTTM,DELETED_FLG ) ,
            COLUMN ( EFFECTIVE_TO_DTTM ) ,
            COLUMN ( DELETED_FLG )
                ON UAT_DM_CF.TECH_MDM_RELATIONSHIP ;

      Date          Time      Unique Value      Column Names                      Column Dictionary Name          Column SQL Names          Column Names UEscape
1     16/03/01    14:58:19         16,250            *                               *                               "*"                             NULL
2     16/02/18    20:05:36         16,143            PREVIOUS_CUSTOMER_ID            PREVIOUS_CUSTOMER_ID            PREVIOUS_CUSTOMER_ID            NULL
3     16/02/18    20:04:54          8,001            CUSTOMER_ID                     CUSTOMER_ID                     CUSTOMER_ID                     NULL
4     16/02/18    20:04:55          8,001            CUSTOMER_RK                     CUSTOMER_RK                     CUSTOMER_RK                     NULL
5     16/02/18    20:05:38         16,143            PREVIOUS_CUSTOMER_RK            PREVIOUS_CUSTOMER_RK            PREVIOUS_CUSTOMER_RK            NULL
6     16/03/01    14:58:17             13            EFFECTIVE_TO_DTTM,DELETED_FLG   EFFECTIVE_TO_DTTM,DELETED_FLG   EFFECTIVE_TO_DTTM,DELETED_FLG   NULL
7     16/03/01    14:58:18             12            EFFECTIVE_TO_DTTM               EFFECTIVE_TO_DTTM               EFFECTIVE_TO_DTTM               NULL
8     16/03/01    14:58:19              2            DELETED_FLG                     DELETED_FLG                     DELETED_FLG                     NULL
dnoeth 4628 posts Joined 11/04
03 Mar 2016

The optmizer can't push this, because CUSTOMER_MDM_ID is a column in query #1, but the result of a calculation in #2:  COALESCE(rel.CUSTOMER_ID, crdt.CUSTOMER_MDM_ID)
 

Dieter

You must sign in to leave a comment.