All Forums Database
madhavi_kl 27 posts Joined 07/06
14 Feb 2007
PPI

hi alli read a statement saying " if a PARTITION BY is defined on a column other than a Primary Index column, then that table cannot have Unique Primary index defined on it. we can only have NON_UNIQUE Primary Index. and it may have secondary indexes."how can we justify this statement? if possible, kindly provide some example.

joedsilva 505 posts Joined 07/05
14 Feb 2007

Suppose you had an employee table with Primary index as Empno, and you wanted to partition it by deptno, you will have to make empno a NUPI though empno was unique. I believe this restriction was placed to avoid TD from probing every partition to ensure the uniqueness of the PI column(s).That being said, you still have to make sure that your empno was unique. How could you do that ?The answer is to create a USI on empno table (which is one of those scenarios where you can have two indices on the same column(s)).

madhavi_kl 27 posts Joined 07/06
14 Feb 2007

"to avaoid TD from probing every partition" in the sence?can you explain more detailed on this sentence.

joedsilva 505 posts Joined 07/05
14 Feb 2007

Let's say that empno was actually a UPI, now given an empno, we know which AMP it will go to (hashing), but we don't know which partition it belongs, without the deptno information.So let's say you have 40 depts ( = 40 partitions ) now let's say you wanted to update an employee number to a different number (ok for some weird reasons, though it's a bad idea to update a PI column ... but heck ! this is just an example )Now let's say the empno was 109 and dept no was 32. the task is to update the empno to 209. But how do you know that 209 is already not there in the table ? you know which AMP it (209) goes to via hashing, so you can send it to that AMP to verify. But how does that AMP know which partition has 209 ? it has no information on that. So it starts looking in all 40 partitions to see if 209 already exist. Which results in bad performance especially when the number of partitions are large.As I said it's just my theory as to why the restriction was placed. I haven't seen it documented anywhere though.

madhavi_kl 27 posts Joined 07/06
14 Feb 2007

thanks for your information joedsilva.but i still have some points to say.1) as you already said.. that was a weird example. when its defined as UPI, we can just forget updating the primary column.i can put my question more detailed.1) we all know to which AMP a row goes to.. from hashing algorithem. and that is done by PI,though its UPI or NUPI.2) and when we use PARTITION BY clause, it will bring all the like values of that column together(in a single partioion) in each AMP's rows depending on which column we decalred the Partition. and we are not bothered here whether its PI column or some other column.i hope you will accept to both the above points.if that is the case, why should we have a UPI, when we are putting "PARTITION BY" on some other column. becuse the first fact is not disturbing the second fact and the second fact is not disturbing the first.i hope you are able to understand the query...if you didnt understand any single point just reply back, so that i can explain you more detailed.thanks once again.

joedsilva 505 posts Joined 07/05
14 Feb 2007

Ok let's forget about the update example, now let me put a question across to you (Ok it's my turn :-) ).I am inserting a record (empno, deptno) as (409, 33);Now tell me that if empno was UPI (unique) how would you make sure 409 is already not there ?Doesn't that AMP has to check in all the 40 partitions ?

madhavi_kl 27 posts Joined 07/06
14 Feb 2007

thats a good start actually.ok.. i can explain like this here the PI is empno and the value that we are trying to insert is 409.so the PE will finalyse the AMP depending on the row-id thats obtained using hashing algorithem.so the AMP is ready.so it will place this row into the AMP. as its the UPI, it should have a duplicate row of empno as 409.if its there, the query will abort for sure.this thing will happen even without "PARTITION BY" clause also.in that case why are they so specific about PARTITION alone?

joedsilva 505 posts Joined 07/05
14 Feb 2007

You got it right to the point that it's easy to decide which AMP the record should go for 409.Ok now let's talk about the difference between PPI tables and NON PPI tables.In non PPI Tables, the table is logically sorted in the order of row hash (row ids to exact, but it won't make much of a diff because row hash is at the MSB...)So if you want to a dupe check for 409, the AMP has to just take the hash of 409 and do a search (now I needn't tell you that search algorithms could be implemented efficiently on a sorted key )for PPI tables, the records are FIRST ordered by their partition numbers and with in a partition they are ordered by hash. which means that Teradata has to check for the hash code in each partition separately. That's certainly in efficient.To give a comparable exercise draw a (balanced one for ease of understanding) binary tree of 100 nodes with numbers as data. Pick a leaf node number and count the number of comparisons you will have to make to reach that node. ( do the same exercise looking for a non-existing number )Now make 10 different binary trees each having 10 nodes and put the same numbers there. Pick the leaf node of 6th binary tree and try counting the number of comparisons you need to make starting from the first tree to reach that node. Now try searching for a number in the 10th tree and try searching for a number that doesn't exist.Welcome to the damage of partitioning.Don't cheat make sure you put 100 nodes in that tree !

madhavi_kl 27 posts Joined 07/06
15 Feb 2007

Hi joedsilvaThanks for your responses.I understand the concept of PPI and NON-PPI in a table.But my question was different.What I am concerned about is… we have PARTITION used in our table, but not on the PI column. Its on another column.In that case we should not have UPI defined on that table, instead only NUPI is allowed.Now my question here is why is that UPI not allowed?PARTITION concept is used only for grouping the rows which already distributed per AMP.(what I mean is.. By that time its already decided to which amp the row should go. Once the AMP is decided, all the rows of that table in that AMP are going to get grouped depending on the partition column value). because of this concept the searching time will get reduced as you said.thats absolutely correct.If it is the case, why is that UPI and NUPI making difference, when we are not PARTITIONING by the PI column.

joedsilva 505 posts Joined 07/05
15 Feb 2007

( You didn't do that exercise I told you ? did ya ? )Ok so we have an agreement that it takes the same amount of effort to decide to which AMP a row should go whether the table is PPI or non PPI.Ok given that I was inserting record (409, 33).So let's say AMP X is the target for hashing on 409.So the AMP is decided. Everything is cool so far.for the non PPI table AMP will use that hash, search on that hash and check if 409 is already there ( remember that the table is sorted on row hash, so search is faster )Now for the PPI table let' say we had 40 depts, then AMP has to probe in all the partitions (within a partition it's sorted on rowhash, but you still don't know the parition to look for a duplicate value) to check for the row hash of 409 that's because there could be a (409, 12) already there or may be a (409, 39) is already there ... and so on .... which means that AMP has to go to each of the 40 partitions and search for this empno to ensure that it's not already there. But the catch is it doesn't know which partitions can contain 409, so it has to search all the partitions.So what's the big deal ?Big deal is that searching over multiple partitions are highly time consuming that searching one giant partition (I mean non - PPI table).It's like the above Tree example I gave you. it's faster to search in a single tree of 100 nodes than in 10 trees of 10 nodes each. ( Seriously if you do that exercise you would understand the difference ). It's a common error to think that both operations take same time, that's why I asked you to do that exercise, in reality the amount of search time required is quite a few many times more.And if you remember in TD tables can have thousands of partitions, which makes the amount of time quite HUGE.Due to this time consuming effort in searching for duplicate PI values in all partitions, it was decided not allow a UPI.BTW I mean to say "it's inefficient to search in all the partitions" in my previous post. I am sorry if I miscommunicated that.Now let me put another question. Let's say you need to find a book in a library. you know only the name of the book, not whether it's a Chemistry book or a Physics book or so.Scenario (1)All the books irrespective of the Dept are arranged in the order of the alphabets.Scenario (2)All the books are divided into different departments. and within department the are arranged in the alphabetical order.Which would help you find the book faster given you don't know the dept to which the book belong ?P.S. Imagine you are in library of congress and you have tones of depts, for this question !

madhavi_kl 27 posts Joined 07/06
15 Feb 2007

joedsilvai have a single question for you.are you aware of the situation that we can use partition by clause on other column which is not a part of PI?ex:Create set table database1.employee_table(emp_no char(04) not null,Emp_dept char(05) not null,Emp_last char(10),Emp_first char(20))Primary index (emp_no)Partition by (emp_last);In this example we are doing partitioning other than PI column. Why I am mentioning this is.. In the replies you mentioned I can only see PPI and non-PPI.I am not at all mentioning the partitioning of the PI.

Fred 1096 posts Joined 08/04
15 Feb 2007

If you use PARTITION BY clause, Teradata will hash the PI to an AMP then "partition the PI" within the AMP. It's still called "PPI" even if the partitioning expression references a column that isn't used in the PI.

joedsilva 505 posts Joined 07/05
15 Feb 2007

And to add to what Fred said, in real world scenario, don't try partitioning on a character column like that. You may be in for a lot of surprises when you try to insert some meaningful data into it.You may also want to refer chapter 9 of the Teradata Database design manual. It has some quite picturesque diagrams and descriptions that may help you.

madhavi_kl 27 posts Joined 07/06
16 Feb 2007

thanks for all that information that you provided.and can i get the book available online?if its so... can i have that link please......

madhavi_kl 27 posts Joined 07/06
16 Feb 2007

to add another question for Fred, when we are not mentioning PARTITION by on PI column, how can it apply on it?we are specifically mentioning on which we column we need the partition/group.

joedsilva 505 posts Joined 07/05
16 Feb 2007

http://www.teradataforum.com/teradata_pdf/b035-1094-096a.pdf

Fred 1096 posts Joined 08/04
16 Feb 2007

In Teradata, the Primary Index is not a physical database object or "subtable" like other indexes are. PI is the method used to determine where rows are placed. Partitioning also deals with placement of rows, so it is viewed as an extension to the Primary Index concept.Non-partitioned tables have data distributed across the AMPs by RowHash and then ordered within the AMP by RowHash. Partitioned tables are still distributed to AMPs by RowHash, then split into partitions, then ordered by RowHash within each partition.Use the Resources / Technical Documentation link on this site (Teradata.com) to find copies of the manuals.

30 Oct 2007

This is an excellent explanation.So in short even if Teradata has allowed a Uniquness on the PI via UPI it will not serve any purpose because it anyway will search all the partitions. Atleast enforcing uniqueness will be possible through USI atleast to satisfy the business requirement. Has anybody seen any supporting explanation of this in any of the Teradata manuals?I guess what Madhavi wants to say is when it is anyway going to search all the partitions why not allow UPI. :-)

Ramakrishna Vedantam, Tata consultancy services, Hyderabad

30 Oct 2007

I think joes explanation was complete and clear.With USI, you hash the usi value and go to the subtable which gives exact location of that record which contains this USI value.I assume the subtable gives complete info which gives the AMP ID aswell as the Partition No in which we have this USI value (Since its a USI we wont be having the same value in one more partition)So again here we are not scanning all the partitions for duplicate row checking.Please correct me if im wrong!!!Regards,Annal T

joedsilva 505 posts Joined 07/05
30 Oct 2007

The point is with an NUPI it WILL NOT probe all partitions during insert (which is what we have been talking all along, not the read part !), but with UPI (if that was allowed theoretically) it will have to probe all partitions.So if you want to enforce "uniqueness", just put a USI as Annal mentioned. This would be a two amp operation, but will make sure it's unique, and the USI subtable will have the rowid and partition# which will tell you which amp & partition to lookup.Now if you are not bothered about "enforcing" uniqueness, it's better to have an NUSI on the NUPI..... it will be a one-Amp operation. That's because TD knows that the PI and SI are on the same columns, so it uses the row hash to pick the AMP, but then directly queries the NUSI subtable in THAT AMP instead of scanning all the partitions.

Rakesh2010 3 posts Joined 10/10
03 Oct 2010

Joedsilva,
I think in case of NUPI , TD has to probe each partition to add uniqueness value into ROWID during row insertion .
Correct me if I'm wrong..

Jim Chapman 449 posts Joined 09/04
04 Oct 2010

No, it doesn't work that way. Since the partition ID is part of the row ID, it is only necessary to ensure that the rest of the row ID is unique within the partition.

Rakesh2010 3 posts Joined 10/10
04 Oct 2010

Jim,
Thanks for your reply. I'm a bit confused .Suppose I'm inserting 3 distinct rows ,2 rows with same (empno, deptno) e.g twice (409, 33) and one row (409,12). so 1st two rows will reside in same partition. Then how uniquness of ROWID is ensured in that case.
Thanks in advance !!!

dnoeth 4628 posts Joined 11/04
05 Oct 2010

Within a PPI table the old 8 byte rowid is no longer unique within the table, it's just unique within partition.
But the 10 byte combination on partition number plus rowid is unique.

The two rows (409, 33) will have a different uniqueness value, but the first (409, 33) and (409, 12) got the same uniqueness value (if empno is PI partitioned by deptno.

Dieter

Dieter

Rakesh2010 3 posts Joined 10/10
05 Oct 2010

Thanks Dieter ,
I got it now.
Many thanks !!!

dwtech 6 posts Joined 10/12
28 Mar 2013

Just reading about the fact regarding the upi and nupi index in the ppi table
 
There are two cases possible:
 
1. Partition index and primary index are same.
 
2. Partition index is not the part of PI.
 
 
 
In case 1, it is possible to make primary index as UPI while in case 2 TD wouldn't allow to create such table which has PI as UPI.
 
Let me explain this with an example.
 
Suppose a handset manufacturing company provides a product id to one of its handset and and all its attributes are true for 6 months, company changes the attributes(colour,Os version) of product in every 6 month but don't change the product id.
 
So for a date (e.g. 1 dec 2013), a product id have “A” attributes and for a date (e.g. 1 aug 2014), same product id can have different attributes. Both the entries need to be entered in database.
 
We can treat product id as primary index and date as partition column. So if we define a UPI on PPI table, I could not enter same product id twice in my database. Because for TD it'll be a unique record
 But for me it'll be a different record and it should be entered in the table. That’s why I put a partition on my date, so that I can search the attribute of a particular handset, which has a unique id worldwide, based on date.
 
In case my PI and partition column are same, table partition will happen based on product id and I wouldn't want duplicate attributes for the same product when I don't have anything else to make a difference between them.
 
Let me know if I am incorrect and provide you explanation as well.

AmolZ 1 post Joined 06/16
01 Aug 2016

How does it help when we choose partition on single column which is part of "multi-columned Unique Primary Index"?

kirthi 65 posts Joined 02/12
04 Aug 2016

It helps in Dynamic partition elimination if you use the partition column in the where Clause.

kamlesh114 7 posts Joined 05/15
24 Aug 2016

  CREATE MULTISET TABLE SUPPORTDB.TABLE1
(EMPNO INT,E_NAME VARCHAR(100),JOIN_DATE DATE )
UNIQUE PRIMARY INDEX (EMPNO)
PARTITION BY RANGE_N (JOIN_DATE BETWEEN '2015-01-01' AND '2016-12-31' EACH INTERVAL '1' MONTH)

I UNDERSTOOD THAT IN ABOVE CASE TD WONT ALLOW TO CREATE TABLE  BY READING ABOVE EXPLAINATIONS

IN BELOW SCENARION IT WILL ALLOW ME TO CREATE TABLE AS PPI COLUMN IS A PART OF UPI

BUT I AM NOT GETTING HOW TERADATA WILL AVOID FOR SEARCHING EMPO IN ALL PARTITION IF I UPDATE EMPNO IN BELOW TABLE AS THE DATA IS PARTIONED BY JOIN_DATE  NOT BY (EMPNO,JOIN_DATE)
  
  CREATE MULTISET TABLE SUPPORTDB.TABLE2
(EMPNO INT,E_NAME VARCHAR(100),JOIN_DATE DATE )
UNIQUE PRIMARY INDEX (EMPNO,JOIN_DATE)
PARTITION BY RANGE_N (JOIN_DATE BETWEEN '2015-01-01' AND '2016-12-31' EACH INTERVAL '1' MONTH)
 

kamlesh114 7 posts Joined 05/15
26 Aug 2016

Please comment..

ToddAWalter 316 posts Joined 10/11
26 Aug 2016

It is generally better to start a new thread for a new question...
 
It is correct that a UPI cannot be defined when it does not contain the partitioning column. This is because enforcing the uniqueness across all the partitions would be prohibitive cost.
 
In the second case, if both the EmpNo and the Join_Date are specified, then only a single partition will be probed to find the row, the access will be very direct. However if only the EmpNo is supplied, a full scan of the table will be required because the full PI is not specified.
 
It would be helpful if the desired access use cases were described. Guessing that it is desired to access a row directly by EmpNo and to access groups of rows by Join_Date or range on Join_Date and to enforce uniqueness by EmpNo, the solution would be to define the partitioning on Join_Date, A NUPI on EmpNo and create a USI on EmpNo. The USI would enforce the uniqueness and provide the direct access for a single EmpNo. The partitioning would allow the date and date range access.
 

kamlesh114 7 posts Joined 05/15
29 Aug 2016

Thank you very much Todd.
I got it what i needed.
 

You must sign in to leave a comment.