All Forums Database
Piotr_Skrzypiec 11 posts Joined 09/13
06 Oct 2013
Primary index + Partition Key + single AMP operation

Hi Teradatares :-)
I am new to Teradata database. Currently I am thiking of creating "big" table that should grow 200 mln rows each week. After a year data it will be 10 000 000 000 rows
After reading some best practice, documentation and forum, that's what I kind of understand.
Create table test
( sample_population_date date,
 campaign_name varchar(200),
 country  varchar(200)
Primary index (sample_population_date)
Partition by range_n (sample_population_date between 01/01/2013 and 01/01/2010 each interval '1' day)
This table will contain campaign population (list of users) that is going to be run everyday) and select queries are going to be affecting one partition.
I want to take benefit of single AMP performance(over nAMPs query) and partitioning.
For this reason I want to Primary index to distribute rows for the same date (campaign) to be store on single amp, additionallly I am paritioning by that date to have many "full" partitions on single AMP.
Single partition size in rows is 200 000 000.
Is it the best performance big table design in teradata? Is there any better way of achiving one big table that will perform for hundreds of million rows in very fast way?
Any negative consequence for this approach like stitistics collections?

dnoeth 4628 posts Joined 11/04
06 Oct 2013

Hi Piotr,
no, no, no, no, don't do that!
The number of rows per PI-value should be as low as possible, as a rule of thumb they should fit on a datablock, i.e. up to a three or four digit number. When the table is partitioned you might get that number per partition. Otherwise you need some serious reasons, but never millions of rows in one partition on a single AMP.
Everything would be really slow, e.g. reading all the rows for a given date would read data from a single AMP's disks instead of all disks in the system.
You need a better PI and sample_population_date might be part of it.
Please read the Database Design manual, there's a module on Primry Indexes including the criteria for selecting a PI.
Btw, what is your DBMS background? Oracle, SQL Server, etc.?


Piotr_Skrzypiec 11 posts Joined 09/13
06 Oct 2013

Ok, I understand that would be to much for single AMP to handle millions of rows.( I am new to TD so I prefered to ask :-) )
I will make PI distributed as evenly as possible , i.e. using ( userid , campaign_name ) that should be unique combination accross table and partition campaign date to allow partitioning prunning.
When that comes to single AMP query, then is it adviced only for a simple quering using Pirmary Index?
I did Oracle mostly, Postgres(Amazon Redshift), SQL Server (not a big fan)

s@ir@m 35 posts Joined 05/13
14 Oct 2013

Hi Dietr.
assume i have a 1 millions records file target table is having  columns like(id,name,gender)
gender column we are mentioned PI(NUPI).
source gender data like(Male,Female,Unknown),
how many AMP's storing the 1 million records

M.Saeed Khurram 544 posts Joined 09/12
15 Oct 2013

Well the first thing is you must try to define a column with unique values as PI to get the even distribution and take benifit of Teradata parallelism.


a column like gender with such a small range of values is always discouraged to be a candidate for PI. But if in anycase you have to define the Gender as PI, then I think that it will involve 3 amps. But for 1 million rows it will cause high skewness in these amps, and the operation will not be successful due to amp skewness.




s@ir@m 35 posts Joined 05/13
17 Oct 2013

Thanks Lot...........

You must sign in to leave a comment.