All Forums Database
lgscheurich 27 posts Joined 09/06
09 Jan 2015
Change data capture

Hoping that someone can help with a situation I'm in.  We have a source table that we're pulling in from SQL Server where the source is 3 million rows, and the table has about 100 columns in it.  Once we pull it in, we have to do a full table comparison to see if any of the columns changed since the last load (CDC).  I've tried using a MINUS, and also a LEFT OUTER JOIN, but in both cases, I run out of spool space because of having to coalesce all of the columns...many of which are null.  I'm looking for suggestions on the best way to do this.
 
Thanks!
Larry

Raja_KT 1246 posts Joined 07/09
09 Jan 2015

Few ideas that crop up in my mind :)

 

100 columns!!!! Comparing all columns, source and target, database resource consumption will be high.Of course , if necessary(very rarely), you are 100% sure of any change, that is in vantage. Cyclic Redundancy check(CRC-code) is another option may be better but not that good too.

 

Can you ask the upstream folks to produce delta? Or you can think of transaction log mechanism where it sniffs for changes and mark the changes. SQL server must have in-built feature for CDC,or you can check with some tools.Triggers can affect performance, if it triggers for changes.Check te replication services available and do some programming to capture changes. Else you have to think of candidate columns that change, like timestamp,date, indicator fields.... if you have any.

 

End of the day, it is project specific requirement and data. So make choice judiciouly to live happily :)

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

dnoeth 4628 posts Joined 11/04
10 Jan 2015

Hi Larry,
both joining on COALESCE and MINUS will result in redistribution steps, you better switch to NOT EXISTS instead:

SELECT *
FROM t1  
WHERE NOT EXISTS
(SELECT * FROM t2
WHERE (t1.col1 = t2.col2 OR (t1.col1 IS NULL AND t2.col1 IS NULL))
  AND (t1.col2 = t2.col2 OR (t1.col2 IS NULL AND t2.col2 IS NULL))
  AND (t1.col3 = t2.col3 OR (t1.col3 IS NULL AND t2.col3 IS NULL))
...
  AND (t1.col100 = t2.col100 OR (t1.col100 IS NULL AND t2.col100 IS NULL))

Assuming both table share the same PI this will result in an AMP-local join without any preparation step. No need to think about which value to use in COALESCE and the code is easily created automatically.

Dieter

You must sign in to leave a comment.