All Forums Database
slakshmipathy 10 posts Joined 02/14
11 Apr 2015
Pattern match and Extracting the words from column

Hi,
I have a requirement to match the pattern and extract the word as explained below.
Ex: Column value  = "A_B_C.D_E_F NOT NULL HAVING COUNT EQ 0 SAMPLE A_B_C.H_I_J"
I need to match the pattern A_B_C and take the word after dot(.) i.e D_E_F and H_I_J
Expected output: D_E_F,H_I_J
Note :  The number of occurence of this pattern is also not constant.
Could somone suggest me a good way to achieve this.
 
Thanks,
Sri

Thanks, Sri
manib0907 61 posts Joined 04/15
11 Apr 2015

How are you providing this pattern as input? Is it given at the run time?

Cheers,
Mani

slakshmipathy 10 posts Joined 02/14
12 Apr 2015

Hi ,
Pattern is constant and we can hard code the pattern in query directly..
Im expecting something from regex but I couldn't  find the proper documents for the same.

Thanks,
Sri

dnoeth 4628 posts Joined 11/04
12 Apr 2015

Hi Sri,
this is a RegEx to find the word after 'A_B_C.', the 1st group searches for the pattern without adding it to the result and the 2nd group extracts the following word:

REGEXP_SUBSTR(x, '(?<=A_B_C.)(.+?\b)')

 
But this only works for the first occurence. If the maximum number is known you could use

REGEXP_SUBSTR(x, '(?<=A_B_C.)(.+?\b)',1,2)
REGEXP_SUBSTR(x, '(?<=A_B_C.)(.+?\b)',1,3)

to get the 2nd, 3rd and so on.
If the number of occurences is high you might use REGEXP_SPLIT_TO_TABLE to get all as rows and then XMLAGG to concat them back.

Dieter

slakshmipathy 10 posts Joined 02/14
12 Apr 2015

Hi Dieter, 
Thanks for your solution. but the number of occurence is not constant in my case so as per your suggestion,  I have to use REGEXP_SPLIT_TO_TABLE and XMLAGG.
But could you please let me know is there any books or documents to learn these functions? 
 

Thanks,
Sri

dnoeth 4628 posts Joined 11/04
12 Apr 2015

Hi Sri,
there's no Teradata-specific documentation on regex, but there are many books and online resources available, simply google for "regular epression". There are different dialects for regex, but Teradata usually follows the Perl variant (Perl Compatible Regular Expressions = PCRE).
XMLAGG is documented in the XML manual, hopefully it's installed on your system.
What's your Teradata release?
 

WITH cte (inkey,x) AS 
   (SEL 1, 'A_B_C.D_E_F NOT NULL HAVING COUNT EQ 0 SAMPLE A_B_C.H_I_J'  AS x)
SELECT
   outkey
  ,OREPLACE(XMLAGG(REGEXP_SUBSTR(token, '(.+?\b)')
            ORDER BY tokennum) (VARCHAR(10000))
           ,' ',',')
FROM
  TABLE(REGEXP_SPLIT_TO_TABLE(cte.inkey, cte.x, '(A_B_C.)', 'c')
  RETURNS(outkey INTEGER, tokennum INTEGER, token VARCHAR(100) CHARACTER SET UNICODE)) AS dt
GROUP BY 1

There's currently an open Tech Alert (TA3744) on failing queries using REGEXP_SPLIT_TO_TABLE, so this might fail on your real data with "Failure 9134 Unexpected failure while attempting to convert UTF8 to UTF16".

Dieter

slakshmipathy 10 posts Joined 02/14
12 Apr 2015

Hi Dieter,
That sounds great. Let me check PCRE syntax to understand better.
We have TD14
Thanks a lot for all your help !!
 

Thanks,
Sri

You must sign in to leave a comment.