All Forums Database
Boopathi15 11 posts Joined 12/13
27 Nov 2015
Avoiding Duplication on All Amps

Hi
I'm trying to join two tables, Left table "TableA" being (30 GB and 90 milion rows) with the another table "tableB" ( 2.5 GB and 21 million rows). 
Following is the sql
Select * from  TableA t1 left join TableB t2
on t1.ColumnX=t2.columnX
and T1.ColumnX is not null;
DDLs:
TableA
CREATE SET TABLE TableA
(
column1 Integer Not null,
column2 BigInt Not null,
ColumnX Integer compress,
column3 VArchar(100),
column4 Varchar(200),
.
.
Column120 Integer)
Unique Primary Index(Column1,Column2)
Index(Column1);
 
TableB:
CREATE SET TABLE TableB
(
ColumnX Integer Not NULL,
Column1 varchar(100),
Column2 varchar(50),
.
.
Column30 VArchar(50))
Unique Primary Index (ColumnX);
 
Problem:
The TableB is always duplicated across ALL AMps which is causing my code to run for long time and lot of CPU seconds. IS there Any way we can avoid this Duplication.
 
 

ulrich 816 posts Joined 09/09
30 Nov 2015

Do you have stats collected on ColumnX on both tables?

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

ToddAWalter 316 posts Joined 10/11
30 Nov 2015

The alternative plan is to redistribute TableA which appears to be a fairly wide table, lots of bytes to redistribute. Are there lots of nulls in columnx in TableA? Are there lots of duplicate values in columnX in TableA? 
 
The rest of the plan is not provided and evidence is of provided to show that the duplication is the costly part of the plan. Since columnx is not unique in TableA, a product join needs to be done within value of columnx in tableB, this join step is probably the costly one not the redistribution.

You must sign in to leave a comment.