All Forums Database
katakoti 21 posts Joined 11/12
20 Nov 2012
SET & Multiset Tables

If i use SET table then the Teradata database will perform duplicate search in that table while loading the data.
When i use UPI, it eliminates duplication search.

Am i correct?
Could some one please explain in detail?

Harpreet Singh 101 posts Joined 10/11
20 Nov 2012

you are correct. Basically below are conditions which can help you
1. Table is set, we create UPI on it. this way teradata checks for duplicacy of PI values and rejects duplicates , no need to scan entire row and all conditions are satisfied.
2. table is multiset and UPI, here if we just check that no duplciates for PI value, we satisfy all conditions. so no need to scan complete row.
3. table is set and NUPI on it, then complete row needs to be scanned to avoid duplicate rows as PI can have duplicates and not bothered with
4. table is multiset and NUPI, no checks need to be done as everythign is good.
I hope that I helped little bit.

katakoti 21 posts Joined 11/12
22 Nov 2012

Thanks Harpreet...!

VandeBergB 182 posts Joined 09/06
22 Nov 2012

Your option four does indeed eliminate the duplicate row and PI checks, but is also most susceptible to duplicate rows.  The duplicate tuple check should be handled by some other mechanism, Changed Data Capture processes, ELT or a Unique Secondary Index.  The USI will maintain uniqueness without affecting the distribution of the data across the amps.
At some point you'll hear or read an Alison Torres presentation, the first concern of picking the PI on a Teradata table is the primary access path, with the caveat of minor skewing.  For that reason, you'll see a lot of tables with NUPI's and USI's on them.  Depending upon the size of the table, with some quick testing you'll see that the increased I/O of the USI subtable is offset by having amp-local joins with large and/or frequently joined tables.

Some drink from the fountain of knowledge, others just gargle.

Harpreet Singh 101 posts Joined 10/11
22 Nov 2012

Right VanderbergB,
When we can have any column in table as unique then it will help query performance as checking duplicates on column is pretty easy than comparing whole rows for duplicate values.
Multiset is table with whole row as duplicate of another one and this should be based on specific requirements only.

abbonizio 1 post Joined 04/11
06 Dec 2012

Option one will incur overhead of UPI duplicate row checking and the additional uniqueness check of the set table. Option two is the one that makes the most sense from what you are asking.

jlasheras 6 posts Joined 08/12
12 Dec 2012

Hi Harpreet,
I'd recommend you please review option #3 in your first answer, if table SET and NUPI, complete row needs to be scanned *unless* a unique index (USI) is also defined, in which case the duplicate row check will be performed on the USI alone. A quite common practice in physical modeling is to use the Foreign Key as a NUPI for best joining with a major entity on its Primary Key, and the Primary Key as a USI, to enforce data integrity and the added benefit of low resources utilization for duplicates checking.
Hi abbonizio,
I believe there's no something as an additional uniqueness check overhead for option #1, the duplicate row check is just performed on the UPI

Harpreet Singh 101 posts Joined 10/11
13 Dec 2012

Yes, to prevent full row scan for duplication in set tabel with NUPI, normal practice is to add uniqueness on some column/columns..

ToddAWalter 316 posts Joined 10/11
14 Dec 2012

This is a good disucssion and a common question. It has become common to use multiset or unique index/constraints to avoid duplicate row checks.
However, this can be overdone. If a NUPI has a reasonable number of duplicates then the cost of the duplicate row checking will be less than the cost of maintaining the separate structures or the impact of getting unintended duplicate rows in the table.
"reasonable" is defined as less than a couple data blocks of duplicate records.

You must sign in to leave a comment.