All Forums Data Modeling
aarsh.dave 24 posts Joined 11/12
20 Jan 2014
Using a PPI column in PI

Hi All,
I have a table that has PI consisting of Col_A and Col_B. Col_C is a timestamp(6) field which is a PPI.
As per the current data, the table has a skew factor of 75%.
Therefore, I created an identical table, this time with PI consisting of Col_A, Col_B and Col_C, with Col_C again as PPI.
Now, when I loaded the records that are in the existing table to this new one, the skew factor is around 5%.
However, I am not sure if this has any other implications in terms of data processing times or anything else.
Please advise.
Thanks,
Aarsh

Adeel Chaudhry 773 posts Joined 04/08
05 Feb 2014

Duplicate post.
http://forums.teradata.com/forum/data-modeling/using-a-ppi-column-in-pi-0

-- If you are stuck at something .... consider it an opportunity to think anew.

raj786 23 posts Joined 04/14
02 May 2014

hi
i have  PI as colA and now the skew is 80%. i added one more col to that pi i.e my proposed pi is (colA,colB) AND the skew is 7%. is there any impact on downstream. like my table is  A->B->C->D. I am working on B if changed PI at B is it impact on C & D. How resolve this?

VandeBergB 182 posts Joined 09/06
03 May 2014

if your downstream queries don't access the new composite PI you'll have full table scans.  what defines the most common access path to the table with colA and colB?  You've got an extreme case of 80% skew and 7% skew.  I'd suspect that your initial PI doesn't define the most common access path because of the data clustering in colA.
build some test cases and see what the explains look like
 
-cheers

Some drink from the fountain of knowledge, others just gargle.

Jake Kurdsjuk 9 posts Joined 06/13
05 May 2014

So table B is second in the join path.  If the joins to Table B do not already include the new index column then there will definitely be an impact - Table B will need to undergo a full table scan if additional WHERE clause conditions exist, and then either it may need to be redistributed to complete the joins to the other tables.  Whether or not the index change impacts the other joins depends on the PI of the other tables, and if they join only on colA, or on colA & colB.
Let's assume that Table C and Table D have different PI's than Table B, both before and after the change.  This will require the row redistribution of one table to complete the join provided the join is on both colA & colB.  If the join is only on colA, then it is possible that both tables in each join will need to be redistributed (if Table B was being redistributed in the old join plan then it will likely once again be redistributed post-change, leaving the rest of the join plan unchanged).  The problem with this is that the redistribution of Table B will be by the old PI column, colA, which will likely cause the spool file created to have that 80% skew factor you're seeing currently.  This can be even more problematic than a skewed table because you will incur the performance hit of the skewed table redistribution every time this join is performed. To make matters worse, if there are no residual WHERE clause conditions that means you could be building your original table with the old PI in spool over and over again.  In this case it would have been better for you to leave the original PI alone and incur the space hit of the skewed index - all other things being equal.
If you do change the PI, and colB is available in the other tables then the joins should be changed to include both columns if they do not already do so. 

Jake Kurdsjuk
Product Manager
Teradata Communications Data Model

You must sign in to leave a comment.