Merge several regexes to a single one

2020-03-18 04:19发布

I have several regexes (actually several thousands), and I must check if one string matches any of these regexes. It is not very efficient, so I would like to merge all these regexes as a single regex.

For example, if a have these regexes:

  • 'foo *bar'
  • 'foo *zip'
  • 'zap *bar'

I would like to obtain something like 'foo *(bar|zip)|zap *bar'.

Is there some algorithm, library or tool to do this?

4条回答
聊天终结者
2楼-- · 2020-03-18 04:24

In theory a regex is a (nondeterministic)finite-state automata; thus they can be merged and minimized. You can take a look at this as a starting point.

Beware, though, that this might not be the most correct answer. Why do you have to deal with several thousands regular expressions? I can only fathom the maintentance hell of such a thing. Perhaps you should consider writing a parser and a grammar -- much easily done (and grammars are more powerful than regexps anyways).

查看更多
够拽才男人
3楼-- · 2020-03-18 04:26

I very much doubt it, on the grounds that any such tool would have to be very complex to deal with all the different ways in which a regex could be combined.

If the regexes you have are relatively simple, such as in your examples, you may have some luck writing your own, however.

查看更多
Fickle 薄情
4楼-- · 2020-03-18 04:33

You can just concatenate the regexes using or (|) (and anchors for the beginning/end of string).

Most good regex libraries optimize their finite state automata after they build it from your regex. PCRE does that, for instance.

This step usually takes care of your optimization problem, ie. they apply most of the transformations you would have to do "by hand".

查看更多
淡お忘
5楼-- · 2020-03-18 04:36

I can't imagine, even if possible, that the resultant regex would be any more efficient.

查看更多
登录 后发表回答