Intro
(you can skip to What if... if you get bored with intros)
This question is not directed to VBScript particularly (I just used it in this case): I want to find a solution for general regular expressions usage (editors included).
This started when I wanted to create an adaptation of Example 4 where 3 capture groups are used to split data across 3 cells in MS Excel. I needed to capture one entire pattern and then, within it, capture 3 other patterns. However, in the same expression, I also needed to capture another kind of pattern and again capture 3 other patterns within it (yeah I know... but before pointing the nutjob finger, please finish reading).
I thought first of Named Capturing Groups then I realized that I should not «mix named and numbered capturing groups» since it «is not recommended because flavors are inconsistent in how the groups are numbered».
Then I looked into VBScript SubMatches and «non-capturing» groups and I got a working solution for a specific case:
For Each C In Myrange
strPattern = "(?:^([0-9]+);([0-9]+);([0-9]+)$|^.*:([0-9]+)\s.*:([0-9]+).*:([a-zA-Z0-9]+)$)"
If strPattern <> "" Then
strInput = C.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Set rgxMatches = regEx.Execute(strInput)
For Each mtx In rgxMatches
If mtx.SubMatches(0) <> "" Then
C.Offset(0, 1) = mtx.SubMatches(0)
C.Offset(0, 2) = mtx.SubMatches(1)
C.Offset(0, 3) = mtx.SubMatches(2)
ElseIf mtx.SubMatches(3) <> "" Then
C.Offset(0, 1) = mtx.SubMatches(3)
C.Offset(0, 2) = mtx.SubMatches(4)
C.Offset(0, 3) = mtx.SubMatches(5)
Else
C.Offset(0, 1) = "(Not matched)"
End If
Next
End If
Next
Here's a demo in Rubular of the regex. In these:
124;12;3
my id1:213 my id2:232 my word:ins4yanrgx
:8587459 :18254182540215 :dcpt
0;1;2
It returns the first 2 cells with numbers and the 3rd with a number or a word. Basically I used a non-capturing group with 2 "parent" patterns ("parents" = broad patterns where I want to detect other sub-patterns). If the 1st parent pattern has a matching sub-pattern (1st capture group) then I place its value and the remaining captured groups of this pattern in the 3 cells. If not, I check if the 4th capture group (belonging to the 2nd parent pattern) was matched and place the remaining sub-patterns in the same 3 cells.
What if...
Instead of having something like this:
(?:^(\d+);(\d+);(\d+)$|^.*:(\d+)\s.*:(\d+).*:(\w+)$|what(ever))
Something like this could be possible:
(#:^(\d+);(\d+);(\d+)$)|(#:^.*:(\d+)\s.*:(\d+).*:(\w+)$)|(#:what(ever))
Where (#:
instead of creating a non-capturing group, would create a "parent" numbered capture group.
In this way I could do something similar to Example 4:
C.Offset(0, 1) = regEx.Replace(strInput, "#$1")
C.Offset(0, 2) = regEx.Replace(strInput, "#$2")
C.Offset(0, 3) = regEx.Replace(strInput, "#$3")
It would search parent patterns until it finds a match in a child pattern (the first match would be returned and, ideally, wouldn't search the remaining ones).
Is there something like this already? Or am I missing something entirely from regex that allows to do this?
Other possible variations:
- refer to the parent and child pattern directly, e.g.:
#2$3
(this would be equivalent of$6
in my example); - create as many capturing groups as necessary within others (I guess it would be more complex but also the most interesting part as well), e.g.: with regex (same syntax) like
(#:^_(?:(#:(\d+):\w+-(\d))|(#:\w+:(\d+)-(\d+)))_$)|(#:^\w+:\s+(#:(\w+);\d-(\d+))$)
and fetching##$1
in patterns like:_123:smt-4_
it would match in: 123
_ott:432-10_
it would match in: 432
yant: special;3-45235
it would match in: special
Please tell me if you noticed any mistakes or flaws in this logic, I will edit asap.