All Forums Database
SD2012 9 posts Joined 05/12
18 Feb 2014
Regular Expression On Teradata 14.0

Hi All,
I've been working for more than 8 years in Oracle 10g, 11g & worked significant queries on Regular expressions in various scenario using SQL. It is real handy if you know how to use it & can reduce lots of pain with single SQL. And, the performance will be better compared to the total effort to achieve the same functionalities by using multiple SQL queries or PL/SQL Procedures.
Last couple of years, I'm working on Teradata. And, on some occassion - I was expecting features like these, where I can easily manipulate data with regular expression. I'm pretty excited when I heard that Teradata also introduced Regular Expression from Version 14.0.
As a result, I tried all those features that I think can be handy & useful for various scenarios & followings are the successful queries that I get. There are two occasion, where Teradata partially able to manipulate those strings. I've checked the latest Teradata Manual. However, unable to find those solution. So, I'm expecting other forum members can contribute here in order to make this thread useful for every one of us. And, I'll post here as soon as I get some answers on these partial conversions.
For better understanding, I've provided the actual column value & after transformation value of that column in the output. That will help us to grasp it easily - I guess. :)
Case 1,

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

COLA             COL_VAL
---------------- ----------------------------------------
SatyakiDe        Satyaki De

 
Case 2,

select regexp_replace('919047242526','^([[:digit:]]{2})([[: digit:]]{10})','+\1 \2') COL_VAL;

COLA         COL_VAL
------------ ---------------
919047242526 +91 9047242526

 
Case 3,

select regexp_replace('+++C','^([[:punct:]]{2})([[:punct:]] {1})(.*)$','\1\3') COL_VAL;

COLA COL_VAL
---- -----
+++C ++C

 
Case 4,

select initcap(regexp_replace(regexp_substr(' satyaki.de@mail.com','[^@]+'),'(.*)(\.)(.*)','\1 \3')) COL_VAL;

COLA                             COL_VAL
-------------------------------- --------------------------------------------------
satyaki.de@mail.com             Satyaki De

 
Case 5,

select regexp_replace('100011001','([[:digit:]]{3})([[:digit: ]]{2})([[:digit:]]{4})','XXX-XX-\3') as COL_VAL;

COLA             COL_VAL
---------------- --------------------
100011001        XXX-XX-1001

Case 6,

select regexp_replace('123456789','([[:digit:]]{3})([[:digit: ]]{3})([[:digit:]]{3})','\3.\2.\1') as COL_VAL;

COLA      COL_VAL
--------- ---------------
123456789 789.456.123

In all the cases, you can pass any column name as argument & hence the entire column value can parse accordingly.
Hope, My solution will help others & at the same time - I'm expecting others to post their relevant regular expression scripts here. So, everyone can get most of their queries related to Regular Expression at a single place. :)
Really appreciate your time to read this thread.
Regards.
Satyaki De.

Regards

Satyaki De

Blog: satyakide.wordpress.com

Adeel Chaudhry 773 posts Joined 04/08
19 Feb 2014

Very nice post Satyaki. You can also evolve this post into an article and publish it on Develop Exchange.
 
Good sharing.

-- If you are stuck at something .... consider it an opportunity to think anew.

jinli 10 posts Joined 11/12
19 Feb 2014

RE was introduced earlier since v13.xx(not from v14). if you have a good grasp RE, it's easy to apply it to TD re-funtions or other rdmbs re-functions as long as re-functions introduced.

M.Saeed Khurram 544 posts Joined 09/12
19 Feb 2014

Nice post Satyaki, As Adeel advised, you should publish it on Teradata Blogs :)
 

Khurram

SD2012 9 posts Joined 05/12
19 Feb 2014

Hi Jinli,
 
I don't know from where do you get the information that Regular Expression introduced in 13.0. And, please specify exact version say 13.0.0.1 not 13.x.x. Prior to that TD has UDF functions to serve the same.
 
Please find the following post in this forum only ->
 
http://forums.teradata.com/forum/extensibility/regexp-like
 
Anyway, If you have any link that suggests - Please provide that here. So, that we can update that information in this thread as well.
 
Regards.
 
Satyaki De.

Regards
Satyaki De
Blog: satyakide.wordpress.com

SD2012 9 posts Joined 05/12
20 Feb 2014

Hi Admin,
 
I've noticed that in the provided queries, some places there are some spaces introduced. Not sure - how that happens. As a result - we have to modify these queries by removing those spaces.
 
For example, 
 
Case 2,
 

select regexp_replace('919047242526','^([[:digit:]]{2})([[:digit:]]{10})','+\1 \2') COL_VAL;

 
Removed the space from 2nd digit notation, which has spaces after ':'.
 
Similarly, Case 3, Case 5 & Case 6 needs to modify.
 
Case 3,
 

select regexp_replace('+++C','^([[:punct:]]{2})([[:punct:]]{1})(.*)$','\1\3') COL_VAL;

 
Case 5,
 

select regexp_replace('100011001','([[:digit:]]{3})([[:digit:]]{2})([[:digit:]]{4})','XXX-XX-\3') as COL_VAL;

 
Case 6,
 

select regexp_replace('123456789','([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{3})','\3.\2.\1') as COL_VAL;

 
So, any extra space may not produce desired result. And, needs to pay attention into these small details. Unfotunately, I cannot edit here. Hence, I've to paste all these correct entries here again. Sorry for that.
 
And, I've tested all these queries in the following two versions -
 

select * from dbcinfo;

	InfoKey	InfoData
1	VERSION	14.10.00.02
2	RELEASE	14.10.00.02
3	LANGUAGE SUPPORT MODE	Standard




	InfoKey	InfoData
1	VERSION	14.10.01.05
2	RELEASE	14.10.01.04
3	LANGUAGE SUPPORT MODE	Standard

 
Hope, this will give you much more clarity. :)
 
One more thing, I would like to clarify here - my intention is to describe more features about these regexp_(like/substr/instr/replace) functions.
 
Regarding version 13,
 
Let us check whether they have these regexp functions or not -
 

SELECT * FROM dbc.dbcinfo;

	InfoKey	InfoData
1	VERSION	13.00.00.15
2	RELEASE	13.00.00.15
3	LANGUAGE SUPPORT MODE	Standard

SELECT * FROM dbc.dbcinfo;
InfoKey InfoData 1 VERSION 13.10.07.12 2 RELEASE 13.10.07.12 3 LANGUAGE SUPPORT MODE Standard 

 

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;

select regexp_replace('SatyakiDe','^(.*)([[:upper:]]{1,})(.*) $','\1 \2\3') AS COL_VAL;
                                   $
  *** Failure 3706 Syntax error:  expected something between '(' and the string 'S' keyword.
                      Statement# 1, Info =35
 *** Total elapsed time was 1 second.

 
Hope this will help. :)
 
Regards.
 
Satyaki De

Regards
Satyaki De
Blog: satyakide.wordpress.com

SD2012 9 posts Joined 05/12
20 Feb 2014

Thanks Adeel & Khurram for the suggestion.
 
Yes. I'll prepare that & share over there, too.
 
Regards.
 
Satyaki De

Regards
Satyaki De
Blog: satyakide.wordpress.com

SD2012 9 posts Joined 05/12
20 Feb 2014

Hi All,
 
It is observed that Case 1 provided here may not work in later TD Releases properly. Hence, you can rewrite that as follows -

 

SELECT regexp_replace('SatyakiDe','([[:lower:]]{1,})([[:upper:]]{1,})','\1 \2') AS COL_VAL;

COLA             COOKED_COLA
---------------- ----------------------------------------
SatyakiDe        Satyaki De

 
 
Also, one more useful query you can find are as shown below -
 
Case 7,
 

SELECT regexp_replace('satyaki9de0loves3to8work2on2sql0and2bi6tools1','[^0-9]+','',1,0,'i') AS DER_VAL;

	COLA	DER_VAL
1	satyaki1de0loves3to8work2on2sql0and2bi4tools1	1038220241

 
 
As you can see, all the characters have filtered out from the string & only numbers are kept here. These sorts of queries are very useful in lots of different business scenarios as well.

 
 
Regards.
 
Satyaki De.

Regards
Satyaki De
Blog: satyakide.wordpress.com

SD2012 9 posts Joined 05/12
21 Feb 2014

Due to some technical issue, I'm not able to post my next topic in a single post. :( So, I'm splitting it into couple of posts & hopefully able to demonstrate some additional information with you guys.
 
Now, Lets see some other functionality.
 
REGEXP_SIMILAR has similar functionalities like REGEXP_LIKE in Oracle.
 
Let's see couple of such cases -
 
Lets prepare the table with some dummy data -

 

SELECT * FROM dbc.dbcinfo;

	InfoKey	InfoData
1	VERSION	14.10.01.05
2	RELEASE	14.10.01.04
3	LANGUAGE SUPPORT MODE	Standard


CREATE MULTISET VOLATILE TABLE TEST_T1
  (
            COL1  VARCHAR(10)
  ) 
  ON COMMIT PRESERVE ROWS;

  INSERT INTO TEST_T1 VALUES('456')
  ;INSERT INTO TEST_T1 VALUES('123x')
  ;INSERT INTO TEST_T1 VALUES('x123')
  ;INSERT INTO TEST_T1 VALUES('y')
  ;INSERT INTO TEST_T1 VALUES('+789')
  ;INSERT INTO TEST_T1 VALUES('-789')
  ;INSERT INTO TEST_T1 VALUES('159-')
  ;INSERT INTO TEST_T1 VALUES('-1-');

 
 
Lets check the data now -

 

SELECT *
FROM TEST_T1;

	COL1
1	123x
2	456
3	x123
4	+789
5	-789
6	y
7	159-
8	-1-

 
 
Continue .....

Regards
Satyaki De
Blog: satyakide.wordpress.com

SD2012 9 posts Joined 05/12
21 Feb 2014

Let's look into the various scenarios now -
 
Case 1 (Returns Mixed Numbers, Signed Numbers & Non Numbers),
 

SELECT *
 FROM TEST_T1
 WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=0;

	COL1
1	123x
2	x123
3	+789
4	-789
5	y
6	159-
7	-1-

 
 
 
Case 2 (Returns Only Unsigned Positive Numbers),
 

SELECT *
 FROM TEST_T1
 WHERE REGEXP_SIMILAR(COL1,'^[0-9]+$','c')=1;

	COL1
1	456

 
 
Case 3 (Returns All Numbers including Positive, Negative & unsigned),
 

SELECT *
 FROM TEST_T1
 WHERE REGEXP_SIMILAR(COL1,'^[+-]?[0-9]+[+-]?$','c')=1;

	COL1
1	456
2	+789
3	-789
4	159-
5	-1-

 
 
Case 4 (Returns Only Non Numbers i.e. Characters),
 

SELECT *
 FROM TEST_T1
 WHERE REGEXP_SIMILAR(COL1,'[^0-9]+','c')=1;

	COL1
1	y

 
 
Hope this will give you some additional idea. :)
 
My objective is to provide basic information to my friends. So, that they can write better SQL in TD while migrating from other popular databases or new developer in TD can get a flavor of this powerful feature & exploit them in all the positive aspect & apply them properly. :D
 
Really appreciate your time to read this post.
 
Regards.
 
Satyaki De.

Regards
Satyaki De
Blog: satyakide.wordpress.com

Adeel Chaudhry 773 posts Joined 04/08
24 Feb 2014

Great work in progress Satyaki .... you should really consider publishing an article on this. :)
 
Can you also add some performance stats on these?

-- If you are stuck at something .... consider it an opportunity to think anew.

SD2012 9 posts Joined 05/12
24 Mar 2014

Hi All,
 
I was busy with some other work. Anyway, today I'll explain about the performance impact of these Regular expressions in Teradata.
 
It is believed that these functions have newly introduced. Hence, it may possible that these function may take some time to settle or in other words we may expect to see some patches before they can be considered as stable & ready to use in TD.
 
Before, we can go through this - we must understood about these functions & where we should use them properly. It is quite obvious that we would like to use them in such places where using teradata's old stable function cannot achieve using a single SQL or we are looking for some kind of Stored-Proc in order to implement this business logic. Hence, it would be unfair to simply compare a simple solution with this. Rather, we should consider those complex parsing logic & the total performance by those Stored-Proc or any relevant process with these functions. In those cases - Regular expression will be very handy - I believe.
 
Let's consider one simple case -
 
Let's consider the following string - "SANTA'S JOYFULL GIFT".
 
I want to fetch the a part of the string it encounters first space character i.e. it will provide the following output as per the business logic - "SANTA'S".
 
I'll test that with significant volume of data & would like to compare the explain plan between the normal process & regular expression.
 
Let's check the explain plan for the SQL that uses conventional functions -
 

EXPLAIN SELECT C_KEY,
        C_CD,
        S_ORG_NM,
        SUBSTR(S_ORG_NM,1,POSITION(' ' IN S_ORG_NM||' ')) AS DER_S_ORG_NM
FROM MASTER_CLAIM
WHERE C_CD = '555';

  1) First, we lock EDW_CORE_DB.MASTER_CLAIM in view ETL_VIEWS.MASTER_CLAIM for
     access.
  2) Next, we do an all-AMPs RETRIEVE step from EDW_CORE_DB.MASTER_CLAIM in view
     ETL_VIEWS.MASTER_CLAIM by way of an all-rows scan with a condition
     of ("EDW_CORE_DB.MASTER_CLAIM in view ETL_VIEWS.MASTER_CLAIM.C_CD = '555 '")
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The input table will not be cached in memory, but it is eligible
     for synchronized scanning.  The size of Spool 1 is estimated with
     high confidence to be 38,212,793 rows (5,082,301,469 bytes).  The
     estimated time for this step is 40.02 seconds.
  3) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 40.02 seconds.

 
Now, let's try the same with the Regular expression -
 

EXPLAIN SELECT C_KEY,
        C_CD,
        S_ORG_NM,
        regexp_substr(S_ORG_NM,'[^ ]+') AS DER_S_ORG_NM
FROM MASTER_CLAIM
WHERE C_CD = '555';

  1) First, we lock EDW_CORE_DB.MASTER_CLAIM in view ETL_VIEWS.MASTER_CLAIM for
     access.
  2) Next, we do an all-AMPs RETRIEVE step from EDW_CORE_DB.MASTER_CLAIM in view
     ETL_VIEWS.MASTER_CLAIM by way of an all-rows scan with a condition
     of ("EDW_CORE_DB.MASTER_CLAIM in view ETL_VIEWS.MASTER_CLAIM.C_CD = '555 '")
     into Spool 1 (group_amps), which is built locally on the AMPs.
     The input table will not be cached in memory, but it is eligible
     for synchronized scanning.  The size of Spool 1 is estimated with
     high confidence to be 38,212,793 rows (105,696,585,438 bytes).
     The estimated time for this step is 40.02 seconds.
  3) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 40.02 seconds.

 
So, from the above two - we really can't find much difference in plan except the number of bytes that transfers. But, in both the cases the estimated time shows 40.02 seconds only.
 
So, now we can check what will be the actual time it will take. Let's see that also.
 
First, let us create one Virtual Table & try to record the total create time -
 

CREATE MULTISET VOLATILE TABLE VT1
AS
    (
         SELECT C_KEY,
				        C_CD,
				        S_ORG_NM,
				        SUBSTR(S_ORG_NM,1,POSITION(' ' IN S_ORG_NM||' ')) AS DER_S_ORG_NM
		 FROM MASTER_CLAIM
		 WHERE C_CD = '555'
    )
WITH DATA
ON COMMIT
PRESERVE ROWS;

 
 
And, the response is as follows -
 

--CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:05.076

 
 
Let's create another VT with the new approach -
 

CREATE MULTISET VOLATILE TABLE VT2
AS
    (
         SELECT C_KEY,
				        C_CD,
				        S_ORG_NM,
				        regexp_substr(S_ORG_NM,'[^ ]+') AS DER_S_ORG_NM
		 FROM MASTER_CLAIM
		 WHERE C_CD = '555'
    )
WITH DATA
ON COMMIT
PRESERVE ROWS;

 
 
And, the response time -
 

--CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:05.762

 
 
So, as you can see there is not much difference between the old process & new process.
 
And, the total number of records we have test this -
 

SELECT COUNT(*)
FROM VT3;

40,781,904

 
So, from the above you can see that we've tested this on significant number of rows, which is very common in any TD system.
 
Let's test whether both the SQLs actually returning same value. To do that - we'll create one more VT are as follows -
 

CREATE MULTISET VOLATILE TABLE VT3
AS
    (
         SELECT a.C_KEY,
				        a.C_CD,
				        a.S_ORG_NM,
				        a.DER_S_ORG_NM AS OLD_PRCHSR_ORG_NM,
				        b.DER_S_ORG_NM AS NEW_PRCHSR_ORG_NM,
				        CHAR_LENGTH(a.DER_S_ORG_NM) AS OLD_PRCHSR_ORG_NM_LEN,
				        CHAR_LENGTH(b.DER_S_ORG_NM) AS NEW_PRCHSR_ORG_NM_LEN
		 FROM VT1 a,
		             VT2 b
		 WHERE a.C_KEY = b.C_KEY
    )
WITH DATA
ON COMMIT
PRESERVE ROWS;

--CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:06.864

 
 
Now, lets test the output -
 

SELECT *
FROM VT3
WHERE OLD_PRCHSR_ORG_NM <> NEW_PRCHSR_ORG_NM;

--SELECT completed. 0 rows returned. Elapsed Time =  00:00:01.763

 
So, as you can see that from the above simulation - we can establish that the performance between the conventional SQL & SQL using Regular expression are negligible.
 
But, again I must clearly say - Regular expression will be ideal where we need multiple SQLs or PL/SQL to implement. Or, the place where you need to implement one complex parsing that is difficult to implement in a SQL.
 
Hope this will give you some clarity. :D

Regards
Satyaki De
Blog: satyakide.wordpress.com

SD2012 9 posts Joined 05/12
27 Mar 2014

Today, I'll show couple of very useful functions or logic implemented in Teradata using It's Regular Expression.
 
There is two very popular demand comes from most of the Developer across different databases regarding the following two cases -
 
1. How to Split Comma Separated Values in each rows 
 
2. How to bind separate values in 1 row (Just opposite of Step 1)
 
2nd Options are very demanding as Cross platform database professional specially Oracle Developers looking for these kind of implementation as Oracle has directly built-in functions to do the same. Those functions are Listagg, wm_concat, group_concat.
 
Let's check the solution -
 
Case 1,
 
Let's create the table & prepare some data -
 

CREATE MULTISET TABLE ETL_DATA.PARSE_STR
  (
     SEQ_NO       INTEGER,
     SRC_STR     VARCHAR(70)
  );

CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.864

 
Let's insert some data -
 

      INSERT INTO ETL_DATA_ICD10.PARSE_STR VALUES(1,'RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY')
      ;INSERT INTO ETL_DATA_ICD10.PARSE_STR VALUES(2,'TUNKAI,SAYAN,BABU,PAPU')
      ;INSERT INTO ETL_DATA_ICD10.PARSE_STR VALUES(3,'IK,ATBIS,SAPMUNDA');

 
 
Let's check the value -
 

SEQ_NO	        SRC_STR
---------       -----------------------------------------------------------------------
	1	RAM,TRIDIB,ANUPAM,BIRESWAR,SUJAY
	2	TUNKAI,SAYAN,BABU,PAPU
	3	IK,ATBIS,SAPMUNDA

 
Fine, Now our objective will be split these comma separated values in each lines. 
 

SELECT b.SEQ_NO,
                regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
            PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND  (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 )
ORDER BY 1,2;

 
And, let's check the output -
 

    SEQ_NO	SRC_STR
     -----      ----------------------
1	1	ANUPAM
2	1	BIRESWAR
3	1	RAM
4	1	SUJAY
5	1	TRIDIB
6	2	BABU
7	2	PAPU
8	2	SAYAN
9	2	TUNKAI
10	3	ATBIS
11	3	IK
12	3	SAPMUNDA

 
Gr8! I guess, result is coming as per my expectation.
 
Case 2(Subsitute Of Listagg, wm_concat, group_concat in Oracle),
 
This we've to do it in Two small Steps for better understanding & performance.
 
First, let us create another table -
 

CREATE MULTISET TABLE ETL_DATA.WM_CONCAT_TAB
   (
      SEQ_NO   INTEGER,
      SRC_STR VARCHAR(20)
   );
   
CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.230

 
Good. Now we'll populate some data into this table. We'll populate data from Step 1 as this will provide the exact data that we're expecting as input test data for Case 2.
 
Let's insert those data -
 

INSERT INTO ETL_DATA.WM_CONCAT_TAB
SELECT b.SEQ_NO,
                regexp_substr(b.SRC_STR,'[^,]+',1,day_of_calendar) AS SRC_STR
FROM sys_calendar.calendar ,
            PARSE_STR b
WHERE day_of_calendar BETWEEN 1 AND  (LENGTH(b.SRC_STR) - LENGTH(regexp_replace(b.SRC_STR,'[^A-Z]+','',1,0,'i'))+1 );

 
Let's check the data -
 

SEQ_NO	SRC_STR
------  --------------------
1	ANUPAM
1	BIRESWAR
1	RAM
1	SUJAY
1	TRIDIB
2	BABU
2	PAPU
2	SAYAN
2	TUNKAI
3	ATBIS
3	IK
3	SAPMUNDA

 
As you know in TD we've significant restcriction regarding Hirarchical Queries & Recursive Queries. So, In this step we'll build one relationship like employee & manager in popular employee table. So, if we have that kind of relation then we can easily establish & fit that in TD model.
 
Let's create this intermediate table. In this case we'll go for mapping between current rows with next rows. This is also very useful process. In Oracle, they have LEAD or LAG functions to achieve the same. But, here we've to work a little bit more to achive the same.
 

CREATE MULTISET VOLATILE TABLE VT_SRC_ARRNG
AS
     (
			SELECT SEQ_NO,
			       SRC_STR,
			       MAX(SRC_STR) OVER(
			                            PARTITION BY SEQ_NO
			                            ORDER BY SEQ_NO, SRC_STR
			                            ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING 
			                        ) AS PREV_SRC_STR,
			       COUNT(*)  OVER(
			                        PARTITION BY SEQ_NO
	                                      ) AS MAX_RECUR_CNT
			FROM WM_CONCAT_TAB
      )
WITH DATA
ON COMMIT
PRESERVE ROWS;

CREATE TABLE completed. 0 rows processed. Elapsed Time =  00:00:01.102

 
 
Let's look the output -
 

SELECT *
FROM VT_SRC_ARRNG
ORDER BY 1,2;




SEQ_NO	SRC_STR	 PREV_SRC_STR	 MAX_RECUR_CNT
-----   -------  --------------- ---------------------
1	ANUPAM	        BIRESWAR 5
1	BIRESWAR	RAM	 5
1	RAM	        SUJAY	 5
1	SUJAY	        TRIDIB	 5
1	TRIDIB	        ?	 5
2	BABU	        PAPU	 4
2	PAPU	        SAYAN	 4
2	SAYAN	        TUNKAI	 4
2	TUNKAI	        ?	 4
3	ATBIS	        IK	 3
3	IK	        SAPMUNDA 3
3	SAPMUNDA	?	 3

 
 
Fine. From the above VT we can see every Source String has one Previous Source String. Also, we've noted down that in each window of SEQ_NO how many levels are there by MAX_RECUR_CNT. We'll use this column later.
 
Let's move to the 2nd & final part -
 
Let's aggregate the values based on SEQ_NO & club them with comma -
 

WITH RECURSIVE WM_CONCAT(SEQ_NO, SRC_STR, PREV_SRC_STR, MAX_RECUR_CNT, LVL,  COMMA_SEP_STR)
AS
     (
        SELECT SEQ_NO,
               SRC_STR,
               PREV_SRC_STR,
               MAX_RECUR_CNT,
               1 AS LVL,
               CAST( '' AS VARCHAR(100)) AS COMMA_SEP_STR
       FROM VT_SRC_ARRNG
       WHERE  PREV_SRC_STR IS NULL
       UNION ALL
       SELECT  b.SEQ_NO,
               b.SRC_STR,
               b.PREV_SRC_STR,
               b.MAX_RECUR_CNT,
               c.LVL+1 AS LVL,
               c.COMMA_SEP_STR||b.SRC_STR||',' AS COMMA_SEP_STR
       FROM VT_SRC_ARRNG b,
               WM_CONCAT c
       WHERE     c.SRC_STR =  b.PREV_SRC_STR
     )
SELECT k.SEQ_NO,
       k.AGGR_STR
FROM (			     
	SELECT SEQ_NO,
	       SRC_STR,
	       LVL,
	       MAX_RECUR_CNT,
	       MIN(CASE
	             WHEN LVL = 1 THEN
	                SRC_STR
	           ELSE
	              'ZZZZZ'
	           END   ) OVER(
	                             PARTITION BY SEQ_NO
	                             ORDER BY LVL ASC 
	                       ) ROOT_SRC_STR,
	       COMMA_SEP_STR||ROOT_SRC_STR AS AGGR_STR
	FROM WM_CONCAT
	)  k
WHERE k.LVL = k.MAX_RECUR_CNT
ORDER BY 1,2;

 
 
Let's check the output -
 

SEQ_NO	AGGR_STR
------- ---------------------------
1	SUJAY,RAM,BIRESWAR,ANUPAM,TRIDIB
2	SAYAN,PAPU,BABU,TUNKAI
3	IK,ATBIS,SAPMUNDA

 
I guess, We've done it. :D
 
So, You can achieve the same without writing any UDF. 

Regards
Satyaki De
Blog: satyakide.wordpress.com

bhartiya007 29 posts Joined 03/14
01 Jun 2015

Excellent post ..lots of learning..thanks Satyaki de.

@Amit

gururaj_cse 1 post Joined 05/09
15 Oct 2015

I have requirement to remove non numeric characters from field and i did use regexp_replace to do that. I work on Teradata 14.0

I executed below on SQL assistant and i got perfect result.
SELECT    REGEXP_REPLACE('1234!6]464! 86[463) 264% 00','[^0-9]+','',1,0,'I');

SELECT    REGEXP_REPLACE('1234!6]464! 86[463) 264% 00','[^0-9]+','',1,0,'I');
	REGEXP_REPLACE('1234!6]464! 86[463) 264% 00','[^0-9]+','',1,
1	123464648646326400

 When i try to submit the same from mainframe bteq script, its not considering '[' ,']' and '^'.  See the result below.
I tried to check PROFILE on mainframe and everything looks as expected. I did HEX ON and even looked for hex value of [, ] and ^. 
But when i submit the job, it doesnt give me proper results.

SELECT                                                                          
REGEXP_REPLACE('1234!6 464! 86 463) 264% 00','  0-9 +','',1,0,'I');             
 *** Query completed. One row found. One column returned.                       
 *** Total elapsed time was 0.01 seconds.                                       
15/10/15                     SELECT REGEXP_REPLACE('1234!6 464! 86 463) 264% 00'
REGEXP_REPLACE('1234!6 464! 86 463) 264% 00','  0-9 +','',1,                    
--------------------------------------------------------------------------------
1234!6 464! 86 463) 264% 00                                                     

Any suggestions?

Regards,
Guru

Fred 1096 posts Joined 08/04
17 Oct 2015

Please start a new topic for a new question.
What hex values do you see for those special characters? Teradata's default EBCDIC mapping is not the same as the frequently used Code Page 037 (EBCDIC037_0E) which can be installed as an option.
x'AD' [
x'BD' ]
x'5F' ^

jhangilp 7 posts Joined 10/15
22 Oct 2015

I am actually having a similar issue where I run this sql statement through a BTEQ on mainframe

 

 

and it actually runs

 

 

If anyone has any suggestions on how fix this it would be great.

jhangilp 7 posts Joined 10/15
22 Oct 2015

Hi I am trying to run some sql statements with regular expressions through a BTEQ and running into a similar issue.
The SQL statement runs fine in TERADATA SQL assistant, but
whe I run this sql statement through a BTEQ on mainframe
 
SELECT CAST(COUNT(*) AS BIGINT) FROM DB_NM.TB_NM WHERE not ((REGEXP_SIMILAR(TRIM(COL_NM), '[0-9]+', 'c')=1 or REGEXP_SIMILAR(TRIM(COL_NM),  '[+-]?[0-9]+.', 'c')=1)) and not TRIM(COL_NM) = '';
 
and it actually runs
 
SELECT CAST(COUNT(*) AS BIGINT) FROM DB_NM.TB_NM WHERE not ((REGEXP_SIMILAR(TRIM(COL_NM), '.0-9.+', 'c')=1 or REGEXP_SIMILAR(TRIM(COL_NM),  '.+-.?.0-9.+.', 'c')=1)) and not TRIM(COL_NM) = '';
 
If anyone has any suggestions on how fix this it would be great.

 

 

You must sign in to leave a comment.