All Forums General
m.sundaram 4 posts Joined 09/12
20 Dec 2014
Data Quality checks on the user data

Hi - I have a requirement to do a data quality check on the data entered by the Users, the data quality check should make sure the data does not contains spaces (leading/trailing),special characters and only cotains numbers and alphabets....the Volume of data ranges from Minimum few thousands to 100 millions ...so the data quality check should runs with a good perfomance ..any Ideas please much appreciated
I had a few idea on the data quality check on the Spaces alone (Trailing/leading) and also it did a good perfomance
select sum(char_length(name)) - sum(char_length(trim both from name ) from X
Say if above the query retruns more than 0, then i consider the data has some apces and fail the process...
Simililary does anyone has any data quality ideas for Special characters,spaces etc ?
 

Raja_KT 1246 posts Joined 07/09
21 Dec 2014

If is is for a small and one-off situation , you can develop scripts. Teradata has Miner's Profiler, DQRM, if you have them. You can check. But many advanced features are also available in Ataccama's solutions for DQ, including MDM....
Tools are more convenient and faster.

Raja K Thaw
My wiki: http://en.wikipedia.org/wiki/User:Kt_raj1
Street Children suffer not by their fault. We can help them if we want.

dnoeth 4628 posts Joined 11/04
22 Dec 2014

Your requirements should be covered by some simple regular expression using REGEXP_SIMILAR

Dieter

m.sundaram 4 posts Joined 09/12
22 Dec 2014

Hi dieter,
I referred to one of your another post to use the otranslate
CHAR_LENGTH(oTranslate(col, 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', '')
However my sql query returns an error like 3706 column name does not match a defined type name ...
 
 
 

m.sundaram 4 posts Joined 09/12
22 Dec 2014

And the teradata version i use is 13.10.07.18

dnoeth 4628 posts Joined 11/04
22 Dec 2014

OTranslate plus regular expressions are implemented in TD14, before there's only oTranslate as part of the Oracle UDFs, ask your TD support for the latest release.
 

Dieter

m.sundaram 4 posts Joined 09/12
22 Dec 2014

Thanks Dieter, Any ideas for TD13 is much apppreciated, currently the production is still in TD13 and im not sure when the whole company might upgrade to TD14
 

You must sign in to leave a comment.