regex to grab text if code exists

2019-07-18 10:57发布

问题:

I m trying to build a regex to add the code value if codename exists

say an example:

{(en56), (sc45), (da77), (cd29)}
{(en56), (sc45), (cd29)}

i will write a regex like {[(]en(?<en>\d{2}).*[(]sc(?<sc>\d{2}).*[(]da(?<da>\d{2}).*[(]cd(?<cd>\d{2}).*

i will grab the first line anyway as it matches and the results of marks will be extracted. how to keep da as optional if the input comes without it.

when i tried with ? , it basically eliminates the values from first result {[(]en(?<en>\d{2}).*[(]sc(?<sc>\d{2}).*([(]da(?<da>\d{2}))?.*[(]cd(?<cd>\d{2}).*

回答1:

New Answer

I just noticed you're using Qt Regular Expressions.
Since, Qt uses PCRE engine, you can take advantage of conditionals
to not only optionally find the items, but to also find them Out-Of-Order.
Whether they are or aren't in order, it still finds them.

So, all the bases are covered. And you get a look at some advanced
regular expression technique.

The idea is to find 1-4 items. This is done with a group construct and
a range quantifier (?: ... | ... | ... | ...){1,4}

The upper range 4 because that is the number of items in the group.

Finally, each item is guarded with a conditional to insure that the
item is not matched again. This is needed to insure the upper limit 4
refers to unique items, while the range makes each one optional.

The side benefit of this is that each item can match out-of-order
which means the item order in which it appears in the source text
is irrelevant.

Good luck! And hope you get a chance to try this out ..

Formatted and tested:

 #  {(?:.*?(?:(?(<en>)(?!))[(]en(?<en>\d{2})|(?(<sc>)(?!))[(]sc(?<sc>\d{2})|(?(<da>)(?!))[(]da(?<da>\d{2})|(?(<cd>)(?!))[(]cd(?<cd>\d{2}))){1,4}

 # Match 1-4 'Out-Of-Order' unique items
 # --------------------------------------------
 {
 (?:                  # Cluster start - loop to find out of order items
      .*? 
      (?:
           (?(<en>)             # Condition, not matched 'en' before
                (?!)
           )
           [(] en
           (?<en> \d{2} )       # (1)
        |                     # or,
           (?(<sc>)             # Condition, not matched 'sc' before
                (?!)
           )
           [(] sc
           (?<sc> \d{2} )       # (2)
        |                     # or,
           (?(<da>)             # Condition, not matched 'da' before
                (?!)
           )
           [(] da
           (?<da> \d{2} )       # (3)
        |                     # or,
           (?(<cd>)             # Condition, not matched 'cd' before
                (?!)
           )
           [(] cd
           (?<cd> \d{2} )       # (4)
      )
 ){1,4}               # Cluster end - find  1 to 4 unique items

Test Input

{(sc45), (en56), (da77), (cd29)}
{(da77), (cd29)}
{(en56), (sc45), (cd29)}
{(da77), (cd29) (en56), (sc45)}
{(sc45)}
{(en56), (cd29), (sc45)}

Output

 **  Grp 0      -  ( pos 0 , len 30 ) 
{(sc45), (en56), (da77), (cd29  
 **  Grp 1 [en] -  ( pos 12 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 4 , len 2 ) 
45  
 **  Grp 3 [da] -  ( pos 20 , len 2 ) 
77  
 **  Grp 4 [cd] -  ( pos 28 , len 2 ) 
29
------------  
 **  Grp 0      -  ( pos 34 , len 14 ) 
{(da77), (cd29  
 **  Grp 1 [en] -  NULL 
 **  Grp 2 [sc] -  NULL 
 **  Grp 3 [da] -  ( pos 38 , len 2 ) 
77  
 **  Grp 4 [cd] -  ( pos 46 , len 2 ) 
29  
------------  
 **  Grp 0      -  ( pos 52 , len 22 ) 
{(en56), (sc45), (cd29  
 **  Grp 1 [en] -  ( pos 56 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 64 , len 2 ) 
45  
 **  Grp 3 [da] -  NULL 
 **  Grp 4 [cd] -  ( pos 72 , len 2 ) 
29  
------------  
 **  Grp 0      -  ( pos 78 , len 29 ) 
{(da77), (cd29) (en56), (sc45  
 **  Grp 1 [en] -  ( pos 97 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 105 , len 2 ) 
45  
 **  Grp 3 [da] -  ( pos 82 , len 2 ) 
77  
 **  Grp 4 [cd] -  ( pos 90 , len 2 ) 
29  
------------  
 **  Grp 0      -  ( pos 111 , len 6 ) 
{(sc45  
 **  Grp 1 [en] -  NULL 
 **  Grp 2 [sc] -  ( pos 115 , len 2 ) 
45  
 **  Grp 3 [da] -  NULL 
 **  Grp 4 [cd] -  NULL 
------------  
 **  Grp 0      -  ( pos 121 , len 22 ) 
{(en56), (cd29), (sc45  
 **  Grp 1 [en] -  ( pos 125 , len 2 ) 
56  
 **  Grp 2 [sc] -  ( pos 141 , len 2 ) 
45  
 **  Grp 3 [da] -  NULL 
 **  Grp 4 [cd] -  ( pos 133 , len 2 ) 
29  

Benchmark

Regex1:   {(?:.*?(?:(?(<en>)(?!))[(]en(?<en>\d{2})|(?(<sc>)(?!))[(]sc(?<sc>\d{2})|(?(<da>)(?!))[(]da(?<da>\d{2})|(?(<cd>)(?!))[(]cd(?<cd>\d{2}))){1,4}
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   6
Elapsed Time:    3.41 s,   3411.71 ms,   3411714 µs


回答2:

You can make that part of the regex optional by enclosing it into a non-capturing group with a ? quantifier that matches one or zero occurrences of the subpattern it quantifies:

{[(]en(?<en>\d{2}).*[(]sc(?<sc>\d{2})(?:.*[(]da(?<da>\d{2}))?.*[(]cd(?<cd>\d{2}).*
                                     ^^^                   ^^

See regex demo

Using this technique you can make more parts of your regex optional if necessary.