All Forums General
chill3che 99 posts Joined 10/12
13 Feb 2014
Implications of having a part of shared disk on all AMPS - join performance

Greetings experts,
Can you please explain whether the following can be achieved or not.
We are having 137 AMPS, joining 2 tables and one of the table_a has 98,016,938 rows.  In explain plan, I could see that table_a is duplicated across all amps
(98,016,938 rows * 137 AMPs = 13428320506 rows)
Instead, if we have a central disk shared by all amps (all amps has its individual disks, besides assume a new small central disk which is shared by all amps, here only 98,016,938 rows can be duplicated instead of 98,016,938 * 137 AMPS which might help in CPU, I/O.
Now the joins can also be performed as the AMP's has the table_b data scattered on its AMPs and entire table_a data in a shared disk and hence can perform join.
It CONTRADICTs the Teradata's shared nothing architecture. 
But what are the pros and cons of this approach? (like reduce the number of records duplicated, waste of space as having a disk only for sharing {duplicated data} etc)

Thanks, Cheeli
Adeel Chaudhry 773 posts Joined 04/08
15 Feb 2014

In my opinion .... it will cause a lot of ambiguities and will be prone to error .... or cause a deadlock/bottleneck for the shared space .... hence performance will degrade .... as multiple queries might want to have shared space.
On the other hand .... if the indexes, statistics and writing-SQL is done correctly, it shouldnt be duplicating this table across all amps.
Whats the size of other tables involved in the query? is this 'table_a' driving table of the query?

-- If you are stuck at something .... consider it an opportunity to think anew.

You must sign in to leave a comment.