All Forums Database
04 Feb 2016
Find Only Duplicates

Hi All,
I have very big table and it has around 250 columns and this table may have duplicates.
this table does not have any UPI or USI or event primary index also based on many columns.
Now my requirement to find the duplicate so tell me what is the best way to find the dups.
I am aware about GroupBY , ROW_NUMBER, UNION. But I am thinking these all will take more CPU and will hit performance.
And its not feasible to group by on all 250 columns or use row_number on these many columns.
So do we have another approach to finnd only duplicate.
 
Thanks in Advance..
AG

ulrich 816 posts Joined 09/09
04 Feb 2016

I don't think you will have much other options the the big group by or the row_number() with big partition.
Sorted export and doing the check via an external program will also require huge resources...
That's the downside of of big multiset tables if you require unique rows and don't enforce it during ETL processes...

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

Sankalp.C 45 posts Joined 08/13
04 Feb 2016

Hi AG,
As not chk had been set while populating the data hence what ever you do can not prevent this senarion to happne in near future. Comming back to your question It have to be a all amp operation in order to find the duplicated in existing table entry, Hence preapare yourself for a BIT CPU use :). Ans is very simle in taht case just do a count using group by having with a greater Than one condition.
 
- Sankalp
 

dnoeth 4628 posts Joined 11/04
08 Feb 2016

If you need to delete the duplicate rows the easiest way should be a SELECT DISTINCT into a new table, drop/rename.
But there should be a logical Primary Key for this table, so you might also have PK violations which are not duplicate rows and then you need COUNT to find them.
To avoid duplicates in future you should try to switch to MERGE.

Dieter

You must sign in to leave a comment.