All Forums Database
pratyushkhosla 2 posts Joined 07/12
01 Nov 2013
PI for slowly changing data table

We have a 3NF model implemented in Teradata and I have a table of the following format :
Surrogate Key              Cust Id       Cust Name    X1        X2        X3           X4            Effective Start Date            Effective End Date
1                                   X-Y23         John Smith     11       22        AB            CD             05-10-2009                         27-02-2010
2                                   X-Y41         Jane Doe        12     24         VC              BS            24-12-2006                         31-12-9999
3                                  X-Y23           John Smith      13    39        XZ             SE              28-02-2010                         31-12-9999
This is a huge table with millions of records and as you can see stores changes to customer records over a period of time - it is a historical data store.
What should be the PI for this table?
Should it be a NUPI on Cust Id OR
Should it be a UPI on Cust Id and Start Date, End Date with a NUSI on Cust Id
Can I have a PPI here?

Raja_KT 1246 posts Joined 07/09
01 Nov 2013


It is purely driven by requirements:


My first thought is this way (But when you spell out the requirements more then we can drill further down):


I think you need to look at the perspective of joining with other tables, say for your partitioning? Does it help to partition and what are the ranges?


Do you need all columns from  that table for joining? if not then you can think of JI with other tables to achieve performance by co-locating.


If possible and if it meets business requirements,try to divide your table into current and history to avoid scanning the monster.Maybe you need only current.


Another thought also is if possible to see that if you can store the slowly changing fields in one table and the rapidly changing fields in another table and use  them according as need.


For PI, you should look for even distribution so that you can use effectively the resources of your system.




Raja K Thaw
My wiki:
Street Children suffer not by their fault. We can help them if we want.

dnoeth 4628 posts Joined 11/04
01 Nov 2013

Hi Prat,
if the number of rows per customer is not too high you should keep cust_id as PI.
When most of the queries access the current data (= where end_date = date '9999-12-31') you might partition by
case_n(end_date = date '9999-12-31', no case) to keep all current rows in a single partition. Of course there's overhead when you change end_date.
Btw, in TD13.10 there's the temporal feature to load & query slowly changing dimensions automatically.


You must sign in to leave a comment.