Regular Expression for REGEXP_SUBSTR in Oracle

2019-05-28 10:27发布

问题:

I have the following text to search:

#S7Z OK
#Wed Feb 18 07:16:26 GMT 2015
expiration=10.0
lastModified=1424192425832
length=466472
path=/name/_master_/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf 
userType=

The words proceeding each = are the names of properties. For each property name, I'd like to get the property value. That means I'm looking for a regular expression to use with regexp_substr to get the value of each known property.

Something like this:

SELECT REGEXP_SUBSTR(
'#S7Z OK
#Wed Feb 18 07:16:26 GMT 2015
expiration=10.0
lastModified=1424192425832
length=466472
path=/name/_master_/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf 
userType=',
'path=.+')
FROM dual

which returns: path=/name/master/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf

But I only want the value, that is "/name/master/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf ". It should also work for expiration, lastModified etc., that is, I don't just want to search for a url but any kind of value.

How can I achieve that in one regular expression?

回答1:

SELECT REGEXP_SUBSTR(
'#S7Z OK
#Wed Feb 18 07:16:26 GMT 2015
expiration=10.0
lastModified=1424192425832
length=466472
path=/name/_master_/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf 
userType=',
'path=(.+)', 1, 1, null, 1)
FROM dual;


回答2:

'. +=' as second argument '' as third argument



回答3:

Here is how you might capture all of the name=value pairs all at once. Note that I use an explicit quantifier {1,10} in the regular expression to prevent catastrophic backtracking. (This particular regex might not actually be subject to that, in which case you could replace the explicit quantifier with +. But best not to take chances!)

WITH s1 AS (
    SELECT '#S7Z OK
#Wed Feb 18 07:16:26 GMT 2015
expiration=10.0
lastModified=1424192425832
length=466472
path=/name/_master_/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf 
userType=' AS str
      FROM dual
)
SELECT SUBSTR(name_value, 1, INSTR(name_value, '=') - 1) AS myname
     , SUBSTR(name_value, INSTR(name_value, '=') + 1, LENGTH(name_value)) AS myvalue
  FROM (
    SELECT REGEXP_SUBSTR(REGEXP_SUBSTR(s1.str,'(\S+=\S*\s*){1,10}'), '\S+', 1, LEVEL) AS name_value
      FROM s1
   CONNECT BY REGEXP_SUBSTR(REGEXP_SUBSTR(s1.str,'(\S+=\S*\s*){1,10}'), '\S+', 1, LEVEL) IS NOT NULL
);

Output as follows:

MYNAME       | MYVALUE
-------------------------------------------------------------------------
expiration   | 10.0
lastModified | 1424192425832
length       | 466472
path         | /name/_master_/563/5638ad54-8079-4399-ba2b-3257b6e6c7fd.pdf
userType     | (null)

Please see SQL Fiddle here.

Note that I could have used REGEXP_SUBSTR(name_value, '^[^=]+'), etc., in the outer query but I figured there were enough regexes in this query (it's a bit expensive, I am sure SUBSTR() plus INSTR() is cheaper!). Also, note that if you're using Oracle 11g or higher, the CONNECT BY clause can be replaced by the following:

CONNECT BY LEVEL <= REGEXP_COUNT(REGEXP_SUBSTR(s1.str,'(\S+=\S*\s*){1,10}'), '\S+')

See revised SQL Fiddle.