All Forums Database
Usri 2 posts Joined 05/12
19 May 2012
string Matching using teradata sql

Hi,

I need to implement the below thing.

 

 

ID

Name

Key

1

ABC Corp Inc.

KEY-456

2

ABC Corp Inc India.

NULL

3

4 D ASSOCIATES DECORATION WORKS

KEY-123

4

4 D ASSOCIATES DECORATION

NULL

5

4K FZ-LLC

KEY- 288

6

4K FZ-L

NULL

7

4K FZ-

KEY - 277

 

 

 

Update the Null Key column value , based on the most similar name found in the table,Suppose for ID 2 - "KEY-456" has to be updated similarly, for ID 6 - "KEY- 288" has to be updated.

I am new to teradata programming..pls help.

Mahs 32 posts Joined 04/11
20 May 2012

Assuming that first 5 letters match exactly for similar names..

 

UPDATE a

FROM

(

  SELECT SUBSTR(Name,1,5) Name,key FROM table 

  WHERE key IS NULL

) a,

(

  SELECT SUBSTR(Name,1,5) Name,MAX(key) key FROM table 

  WHERE key IS NOT NULL

  GROUP BY 1

) b

SET key=b.key

WHERE

a.Name=b.Name

ulrich 816 posts Joined 09/09
20 May 2012

hm

id 1 and 2 are raising the question if the key value need to be 100% equal to the matching part of the null rows.

id 1 has

ABC Corp Inc.

and

id 2 has

ABC Corp Inc India.

so NOT Inc.

is this correct?

Or can we assume that the keys need to be a substr of the null key names?

 

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

Usri 2 posts Joined 05/12
20 May 2012

Thank you for your replies!!!

The Null Key names shuold be substr of the Not null Key names, and we have to update the not Null key based on the most similar Name found in the table.

In the above case, We can update the ID 2 Record key with Id 1 Key, as the name is matching till the second word,  Unless we find another nearest match in the table.

the problem with the above approach is, if we assume  till 5 chars, the query will return above a 10000 records for each SUBSTR :( ..

No of words in the Name column varies from 3-15.

 

 

dnoeth 4628 posts Joined 11/04
20 May 2012

"for ID 6 - "KEY- 288" has to be updated."
Why "KEY- 288" and not "KEY - 277"? 

What if there's "4 D ASSOCIATES DECORATION WORKS' and "4 C ASSOCIATES DECORATION WORKS"

I doubt you'll be able to write all your requirements in a simple SQL statement, there's a reason why that expensive address cleaning software exists :-)

Dieter

Dieter

You must sign in to leave a comment.