All Forums Database
24 May 2006
Comparing two tables

Hi GurusI have two huge tables TABLE-A and TABLE-B (200 Million rows in each table)Now I am checking to see if both the table are identical or not.one way of testing it is sel * from table-Aminussel * from table-Band then sel * from table-Bminussel * from table-Ais there a better/faster way of doing this. Each table has more than 50 columns.Thanks

dnoeth 4628 posts Joined 11/04
26 May 2006

MINUS: spool both tables & redistribute -> sort distinct -> exclusion merge joinIs the table SET or MULTISET? If it's MULTISET you'll need MINUS ALL, if it's SET, then MINUS ALL is more efficient, too.MINUS ALL: spool both tables & redistribute -> sort -> minus all joinBut spooling & redistributing is a large overhead, so if it's a SET table the most efficient way is probably using NOT EXISTS: just a direct exclusion merge joinOf course it's easier to code a SEL * FROM T1 MINUS ALL SEL * FROM T2instead of a SEL * FROM T1 WHERE NOT EXISTS (SEL * FROM T2 WHERE T1.col1 = T2.col2 AND ... T1.col50 = T2.col50) but it's probably worth the effort (and you can create the code using dbc.columns)Dieter

Dieter

26 May 2006

Thank you very much!I really appreciate it.

Ghalia 11 posts Joined 12/15
14 Jan 2016

Hello,
Please i just want to verify that i well understood :  you are saying that NOT EXISTS is better  than MINUS / MINUS ALL , that's it ?
Can you give me more details please ?
 
Many thanks.
 
Ghalia

Fred 1096 posts Joined 08/04
14 Jan 2016

As Dieter said (years ago): MINUS / MINUS ALL will always spool / redistribute / sort both tables.
If you code a NOT EXISTS, the optimizer will take advantage of the PI being the same and just do a direct AMP-local join.
Note that the results for NOT EXISTS are equivalent to MINUS, not MINUS ALL. That's fine for SET tables, but not for a MULTISET table with duplicate rows.

You must sign in to leave a comment.