All Forums General
Henshin1 11 posts Joined 11/10
21 Dec 2010
Acceptable Duplicate Row Count

Hi all,

I have a database with many tables (around 60-70) where around half the number of tables, or slightly more, have a small amount of duplicate rows. (Between 0-3% duplicate rows on average in a table). I know the best case is to have 0% of duplicate rows, however I was wondering, if there is some rule of thumb or industry standard regarding an acceptable amount of duplicate rows in a table.

Would 2% duplicate rows be considered a big problem or would this be somewhat acceptable in most cases in the industry.

Thanks all.

dnoeth 4628 posts Joined 11/04
22 Dec 2010

I would consider any duplicate row (outside of the staging area) to be non-acceptable.

Not only because it contradicts the Relational Model (no PK), but also because it might screw your queries/results.
How to join to a table without uniqueness?

This must be handled during load.

If 2% duplicate rows are in the payroll application i'd like to be one of those duplicates :-)



Vador 36 posts Joined 08/07
08 Jan 2011

Agree with Dieter, however in some cases (I have seen them in Telco industry) dups are functionally required, the number of acceptable dups in such case depends to the number of hash collision you may have with your PI, If the number of hash collision on your PI is less than 100,
then I would accept a maximum of 100 dups in the multiset table.

You must sign in to leave a comment.