All Forums Analytics
David Korb 8 posts Joined 07/09
27 Oct 2009
Validate column data

Hi,I'm trying to validate that each character in a varchar(100) row is between 0 and 9 on a 250M row table. Any rows that are not valid will be ignored in an insert statement into a base table. However, this is giving me a hard time. I tried using a tally table with substring comparison in an exists clause, but the data are just too huge. Is there an efficient way to validate my data?Thanks,David

Jimm 298 posts Joined 09/07
27 Oct 2009

Check if you have the Oracle UDF's installed - you can get them from the Teradata Download centre.There is an otranslate finction - Oracle's Translate - which can check them for you.See: http://teradataquestions.stackexchange.com/questions/16/the-format-or-data-contains-a-bad-character

David Korb 8 posts Joined 07/09
27 Oct 2009

I'm a downstream user with few permissions and I see no UDFs in the meta-data. My work is pretty much limited to pure SQL.

RB185048 33 posts Joined 09/08
28 Oct 2009

I have not understood your doubt?What you want to do , either take the valid records only or you want to see the invalid records?If you want to see the invalid records(records with character from Ato Z).Login through bteq use .Set Session transaction ANSI ; before login create volatile table test_acct(acct_num varchar(20) ,acct_num2 varchar(20) )on commit preserve rows ;insert into test_acctsel upper(acct_num) , lower(acct_num)from accounts ;Now you can take out the records which has characters by sel * from test12where acct_num <> acct_num2 ;Hope this works:)

David Korb 8 posts Joined 07/09
29 Oct 2009

That's interesting as far as alpha-numeric data, but these strings also include various non-alpha characters. However, I solved the problem through direct analysis of each character column by grouping:SELECT SUBSTRING( order_number FROM n FOR 1 ), COUNT(*) AS TotalFROM OrdersGROUP BY SUBSTRING( order_number FROM n FOR 1 )ORDER BY SUBSTRING( order_number FROM n FOR 1 );In the end I found there were repeated patterns which could be filtered easily. If new patterns appear in the future, then the process will break and the script will have to be modified. The data are pretty regular, so not too bad really.Thanks for the suggestions!

You must sign in to leave a comment.