I am going through exercises regarding regular expressions, and I am really unsure on how to do this.
The regular expression is:
((a*)(b*))* ∪ (a*)
I am really bad at this, but I think that ((a*)(b*))*
can be simplified to (a ∪ b)*
But if this is right, than the last ∪ (a*)
is really just a repetition, so I figure the whole expression can be simplified to (a ∪ b)*.
Does this seem correct?
Edit: ∪ stands for union
You are right. (a*b*)*
can match any string of a's and b's, so can (a U b)*
, therefore they are equivalent. (a U b)*
intersect a*
is a*
so a*
is a subset of (a U b)*
. Consequently, the whole expression can be simplified to (a U b)*
.
What ((a*)(b*))*U(a*)
really means is (copied from here)
NODE EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1 (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
a* 'a' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
b* 'b' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
)* end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
--------------------------------------------------------------------------------
U 'U'
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
a* 'a' (0 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of \4
This expression currently matches all of these sequences: abUa bU U aabbUaa aaUaa aaU Uaa bbU ababUaa aabbaabbUaa
(look at here)
There is no way to simplify this, without removing capturing groups and remaining order of letters.
EDIT: If U
in your regex statement stands for "union", then this expression is invalid. There is no way to union anything in regex. There is only OR
and you need to use |
(pipe) for that. If you want to union ((a*)(b*))*
and (a*)
then probably it will be ((a*)(b*))*
, but it will still match anything like abaab
.
Still, capturing groups in your regex statement are useless, so something like [ab]*
is enough to match any number of a
's and b
's.