All Forums Database
abin 11 posts Joined 04/11
26 Nov 2013
Explain plan for set table

Hi,
 
I have a requirement to convert around 50 tables into multiset from set. I am trying to get the tradeofff of keeping the table as set against perfomance of inserting recs into it. I did an explain plan on insert statmenets for a table defined as set and then altered it as set. Both insert gives me same explain plan. I was expecting some additional steps in plan for set table because of system level duplicate check.
 
So my question does explain plan shows the additional steps involved in syetm level duplicate check performed for a set table.
 
Abin

VBurmist 96 posts Joined 12/09
26 Nov 2013

 
Hello Abin,
you have really provided the answer in the first paragraph, the are no additional steps.   The phrase "defined as set and then altered it as set" has the "set" mentioned twice, but I guess you mean one "set" and one "multiset".
 
The reason is that the duplicate check is performed not before and not after the insert, but during the insert, and only for the rows being inserted.   The check is performed on the AMP level, not on the system level.    Why?   Remember that rows in regular Teradata tables are stored in PI rowhash sequence (PPI and NoPI are a bit different).   When new rows are added, they are not added at the end of the table, but at the proper blocks to ensure the rowhash sequence.    Therefore, it is convenient to do the duplicate check during the insert:   for a given row, the AMP locates the data block where the row should be inserted, and if there is a duplicate row that already exists, then that row is located in that data block (duplicate row - same PI value - same rowhash value).   So, it is convenient to check the duplicates right during the insert operation.
 
Regards,
Vlad.
 

M.Saeed Khurram 544 posts Joined 09/12
26 Nov 2013

I agree with Vlad, the number of steps will be same with set and multiset tables, but the duplicate row check will be made at block level, byte by byte.
 

Khurram

Raja_KT 1246 posts Joined 07/09
26 Nov 2013

Hi,

What is your data volume?
I feel that as  amount of data grows huge then you will see the difference of set and multiset, a duplicate check for set.

Cheers,
Raja

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

VBurmist 96 posts Joined 12/09
26 Nov 2013

Hi Khurram,
Thank you for agreeing with me.   However, I believe that there is no need to dedicate an additional post to say only that. :)  
Regards,
Vlad.

M.Saeed Khurram 544 posts Joined 09/12
26 Nov 2013

Hi Vlad,
you are welcome, but this time I do not agree with you :)
 
 

Khurram

abin 11 posts Joined 04/11
27 Nov 2013

Thanks all for the responses.
 
Vlad,
Yes the statment has a spello, it should read as " I did an explain plan on insert statmenets for a table defined as set and then altered it as multiset."
 
Raja,
The volume of data differes, I have around 60 tables to consider.
 
So all in all , I am going to look at the data volume I am expecting on each table. If table volume is not expected to be less even after long period, I am planning to keep them as set.
For larger tables (in count), I will have to check the trade off between keeping table as set and doing a group by on all fields after changing table to multiset. This is where I am stuck, because adddition of group by shows in plan with additional time and I am not able to check if that extra time is bad or good than the AMP level duplicate check for set table.
 
Abin.

VBurmist 96 posts Joined 12/09
27 Nov 2013

Hi Abin,
it is possible to measure.   Create two similar tables - one set and one multiset, with the same data.   Insert same new data into both tables.   Ant then measure CPU consumption in the DBQL journal, in the AMPCpuTime field.   Even though the Explains are the same, the CPU consumption would probably be different due to duplicate checks.
 
Because of the AMP level mechanics mensioned above, the duplicate check does not directly depend on how big the table is.   Rather than that, it depends on the demographics of the Primary Index.   If it is almost unique, then the duplicate checks are reasonable.  Because you have to compare the rows only with the same PI value.   However, if there are PI values with high numbers of rows per value, then the insertion of a new row enforces the check with many rows with the same PI.   That's CPU intensive.
 
Should you have DBQL results that would prove otherwise, please do not hesitate to share.
 
Thank you,
 
Regards,
Vlad.
 

abin 11 posts Joined 04/11
27 Nov 2013

Thanks Vlad for suggestion, I will check this and get back in case of any more questions.
 
Abin

You must sign in to leave a comment.