All Forums Database
jnevill 17 posts Joined 03/12
01 Nov 2013
regexp_replace woes

I want to replace all of the whitespace and non alphanumeric characters in a string. It seems like some form of the following should work:

SELECT REGEXP_REPLACE('00 - L* 123-4383', '[^a-zA-Z0-9]+', NULL)

However, that only replaces the initial find of ' - ' and leaves '* ' and '-' in place. I'm not interested in nesting REGEXP_REPLACE and, since I'm dealing in UNICODE, the block of characters that I would have to put into the TRANSLATE function would be unfathomably huge, so that's out.
Surely the usecase of REGEXP_REPLACE isn't so narrow that it can only find a single block of patterned text in a string. Right? Please tell me I'm missing something.

dnoeth 4628 posts Joined 11/04
01 Nov 2013

The documentation on the manuals seems to be buggy (or the implementation doesn't match it).
By default REGEXP_REPLACE replaces only the first occurance, you have to add optional parameters:

REGEXP_REPLACE('00 -j L* 123-4383', -- sourcestring
'[^a-z0-9]+', -- regexp
'', --replacestring
1, --startposition
0, -- occurrence, 0 = all
'i' -- match -> case insensitive
)

Using 'i' as match argument you don't need both uppercase and lowercase a..z.
 
Dieter

Dieter

jnevill 17 posts Joined 03/12
01 Nov 2013

Nice.  I tried to add the 5th option parameter to have it continue searching, but the database yelled at me. Seems that you have to include the 6th parameter or it errors :/  Thanks for the pointer!

dougfy 4 posts Joined 11/12
15 Nov 2013

I have opened DR 168432 and we have done a recheck to see what is not correct with these functions.
 
Here are items where our documentation is wrong:
1) If the startSearch parameter is not specified, it defaults to 1 as stated. However, if it is specified as NULL, we return NULL. The documentations says if NULL, it defaults to 1.
2) If the occurrence arg is not specified, it defaults to 1 (not 0 as stated in the documentation.) If it is specified as NULL, the function returns NULL. The documentations says if NULL, it defaults to 0.
3) The documentation states: If match arg is omitted, an error is returned. This is not true.
Here is where we deviate from Oracle.
1. Oracle allows 3,4,5,or 6 parameters. We don't allow 5. – Will be fixed.
2. Oracle defaults the occurrence parameter to 0 if not specified, we set it to 1. - Will be fixed.
3. Oracle allows an x in the match arg meaning ignore whitespace. – Will investigate if we can implement.
4. If match arg is omitted. We default to case sensitive since there is no NLS_SORT.
5. We don't handle backreferencing correctly. Will be fixed.

aniketd006 5 posts Joined 07/14
17 Jul 2014

Hi there,
Thanks for the comments and suggestion. I am facing some problems with regexp_replace. I am trying to replace all instances of substring which matches the regexp. But it keeps throwing me an error. He are the details:

CREATE TABLE sample_split5 DISTRIBUTE BY HASH(text_without_htmltags) as
SELECT text_with_htmltags, regexp_replace(text_with_htmltags, '<([^>])*>',' ',1,1,'c') AS text_without_htmltags
FROM sample_split;

But i keep getting an error: 

Executed as Single statement.  Failed [34 : 42000] [AsterData][ASTERJDBCDSII](34) ERROR: function regexp_replace(character varying, unknown, unknown, integer, integer, unknown) does not exist () 

Elapsed time = 00:00:00.103 

 

STATEMENT 1: EXPLAIN Statement failed. 

 

I am not sure why this is happening. When I try removing the last 3 arguments it works perfectly fine. Can you help me out here? 

 

Thanks in advance,

Aniket

 

dnoeth 4628 posts Joined 11/04
17 Jul 2014

Hi Aniket,
the REGEXP_REPLACE in Aster has different arguments:
regexp_replace(source, pattern, replacement [, flags ])
This should work:
regexp_replace(text_with_htmltags, '<([^>])*>',' ','g')

Dieter

aniketd006 5 posts Joined 07/14
18 Jul 2014

Thanks Dieter - worked like magic! :)

epicjourney 1 post Joined 02/12
29 Aug 2014

I am attempting to use regext_replace with a string from a table that contains hex '1A'. The query returns error "the string contains untranslatable character. The string should contain alphanumerics only. How do I  find and replace non-unicode characters from this string?? My query is:
select REGEXP_REPLACE(profit_center_name, '[^a-zA-Z0-9]+',  '', 1, 0, 'i')
from scratch.FB_PROFIT_CTR_CORP_BUS_GRP_REL

sri899 5 posts Joined 09/15
02 Sep 2015

Hi all ,
 
I need some inputs for the following issue.
 
I am getting data as below this if I run a SQL
Dept No    Name    Salary     Region     Designation         Stream
1                A          1000       AO              Engineer               IT
1                    B              2000         AO                Engineer                IT
2                     C             3000         JO                    Sr. Engineer        IT
1                   D                 4000        FO                 Manager               IT
1                   E                 500           AO                Engineer               ?
1                  F                    400           AO                 Engineer              ?
 
But the required  output should be like this
Dept No    Name         Salary              Region        Designation           Stream 
1                   A,B         1000,2000            AO                  Engineer               IT
2                     C               3000                   JO                 Sr. Engineer          IT
1                   D                 4000                   FO                 Manager               IT
1                   E                 500                      AO                Engineer               ?
1                  F                    400                      AO               Engineer              ?
 
Conditions applied were :
- If Stream =IT  and data in all the columns(except Name , Salary) are same   then concatenate the  data in to single row with a comma seperator.
- If stream is not IT then display the data as it is.
 
To acheive the above I used the following TD function
cast(regexp_replace(tdstats.udfconcat(trim(A.NAME)),'"','',1,0,'i') as varchar(100)) as Name,
cast(regexp_replace(tdstats.udfconcat(trim(b.Salary)),'"','',1,0,'i') as varchar(100)) as Salary
After using this I am facing an error :
9134 : Results exceeds the output argument  size limit
 
Any inputs / suggestions  are appreciated
Regards
  

ASCHARAN 10 posts Joined 09/15
15 Oct 2015

Hi Guys,
I am asked to use some of the existing functions we have. Whats the use of this statement ?
TRIM(REGEXP_REPLACE(REGEXP_REPLACE(CAST(INOUT_STRING AS CHAR(3000)), '[\t\r\n\v\f|]',' ',1,0, 'I'), '[^[:print:]]','',1,0, 'I'))
 
Thanks.

dnoeth 4628 posts Joined 11/04
15 Oct 2015

First some control characters are replaced by a space:

\f = 0x0C form feed 

\n = 0x0A newline

\r = 0x0D carriage return

\t = 0x09 horizontal tab

\v = 0x0B vertical tab
And then remaining non-printable characters ([^[:print:]]) are removed.

Dieter

swansoca 2 posts Joined 08/14
05 Jan 2016

I'm getting a 3706 error in the following line saying I need something between pfp.plan_other_info and ','. This worked before but not now. My associate said he had the same issue. Can you help?
CAST((CASE WHEN pfp.plan_other_info LIKE ALL ('%evoucher%','%paid%') THEN REGEXP_SUBSTR(SUBSTRING(pfp.plan_other_info ,INDEX(pfp.plan_other_info ,'paid') ,INDEX(pfp.plan_other_info ,'toward') ) , '[0.00-9.00]+') +3 ELSE 0 END) AS DECIMAL(18,2)) AS evoucher_plan_ar,

Fred 1096 posts Joined 08/04
05 Jan 2016

SUBSTRING function requires FROM and optionally FOR keywords inside the parentheses; to use commas instead, change to SUBSTR function.
But note that the third argument is a length and not the ending position, i.e. you may want to use the difference of the INDEX values +1.

swansoca 2 posts Joined 08/14
06 Jan 2016

using SUBSTR seemed to work. Thanks
 

You must sign in to leave a comment.