All Forums Database
31 Jul 2013
Search for first substring in string with condition

Any help on string search. No problem if recursive. I do not know how to do the following.
I want to find the first position of “b” in the string where “b” is not between 2 “a”’s. The select statement I have made returns the value 6 for row nr (Seq=1). I want it to return the value 17, the first b outside the two “a”. Is this possible?
CREATE MULTISET VOLATILE TABLE T1 (Seq INTEGER, Str VARCHAR(100)) ON COMMIT PRESERVE ROWS;
INSERT INTO T1 VALUES (1,'a----b----a-----b----b------');
INSERT INTO T1 VALUES (2,'b----b----b------');
INSERT INTO T1 VALUES (3,'---b--a--b--a--b------');
INSERT INTO T1 VALUES (4,'--b----b----b------');
 
SELECT Seq,
       Str,
       INDEX(Str,'b') AS idx,
       CASE
         WHEN Str LIKE '%b%' AND NOT Str LIKE '%a%b%a%' THEN INDEX(Str,'b')
         ELSE INDEX(Str,'b')
       END
FROM T1;
DROP TABLE T1;
Peter Schwennesen

ulrich 816 posts Joined 09/09
31 Jul 2013

which version are you on?

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

ulrich 816 posts Joined 09/09
31 Jul 2013

performance depend on your data volumes but at least this does the trick with your data ;-)
I used the volatile table to make it a bit more clearer 
First splitt your data into chars and keep only  the 'a' and 'b'

create volatile table t2 as
(
select seq,
       str,
       id,
       substr(str,id,1) as v
from t1
     cross join (select current_date - calendar_date as id from sys_calendar.calendar where id between 1 and 100) i
where 
     id <= characters(str)
     and v in ('a','b')
) with data 
primary index (seq)
on commit preserve rows
;

the check for 'aba' with olap or keep the b if it is the fist
and finally aggregate everything to keep only the first.

select seq, str, min(id)
from 
(
	select seq, 
	       str, 
	       id, 
	       v
	from t2
	qualify 
	         (min(v) over (partition by seq order by id rows between 1 preceding and 1 preceding) 
	         || v  
	         || min(v) over (partition by seq order by id rows between 1 following and 1 following) <> 'aba'
	        
	         and v = 'b'
	         ) or (
	          v = 'b' and id = min(id) over (partition by seq)
	         )
) as t
group by 1,2;

a bit of bruce force but as mentioned it does the trick

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

dnoeth 4628 posts Joined 11/04
31 Jul 2013

Hi Peter,
based on the data you provided this should work (i just use POSITION instead of INDEX):

SELECT Seq, Str,
   CASE
      WHEN POSITION('a' IN Str) = 0
        OR POSITION('a' IN Str) > POSITION('b' IN Str)
        OR Str NOT LIKE '%a%b%a%' 
      THEN POSITION('b' IN Str)
      ELSE POSITION('b' IN SUBSTR(Str, POSITION('b' IN Str) + 1)) + POSITION('b' IN Str)
   END
FROM T1

Of course your specs are a bit vague :-)
Are following cases possible and what's the expected result?
'aba--aba---b'
'abba---b'
'a-ab-a---b'

This might require recursion or maybe a regular expression in TD14 using REGEXP_INST.

 

Dieter

Dieter

01 Aug 2013

Hi, thanks for the answers. I look at it.
Yes everything is possible.
Problem is: find first "b" not between two "a". There therefore (“-“ just represent random number of char)
ababbb should return me 4
a-b-a-bbb should return 7
baba should return 1
The original problem is that I am trying to clean out comments from my code. So in real life I am not looking for A’s and B’s, but “/*”, and “*/” and “—“.
But I ran into the problem that in some places the /* do not represent a comment, but is part of a text string. Therefore:
SELECT ‘SOME TEXT’, /* this is …. Here I want to find the position of “/*”.
But:
SELECT ‘SOME /* OTHER TEXT’, /*
Here I should not consider the first /* as a comment but only the second one.

dnoeth 4628 posts Joined 11/04
01 Aug 2013

Hi Peter,
in cases like this it's better to post the real life question instead of an overly simplified one :-)
 
Seems like you try to clean DBQL data from comments, this will be a tough one...
What's your TD release?
Do you have at least oTRANSLATE or oREPLACE installed on your system? 
Best case would probably be REGEXP_REPLACE, a regular expression should be able to do this task.
DIeter

Dieter

ulrich 816 posts Joined 09/09
02 Aug 2013

I agree with Dieter. But I also don't think this will work within the DB - at least not for the big SQL's which spann multiple lines in qrylogSQL.
I use a java programm to do this which parses the SQL from left to right and checks what it find first (e.g. ',/*,--) and then checks for the corrosponding closing  part (',*/.\n).
Easy and efficient...

feel free to donate bitcoin:12kgAUHFUqvG2sQgaRBXFhCwyf9HXdkGud

You must sign in to leave a comment.