Is “regex” in modern programming languages really

2019-02-13 07:26发布

Over the years, "regex" pattern matching has been getting more and more powerful to the point where I wonder: is it really just context-sensitive-grammar matching? Is it a variation/extension of context-free-grammar matching? Where is it right now and why don't we just call it that instead of the old, restrictive "regular expression"?

3条回答
地球回转人心会变
2楼-- · 2019-02-13 07:38

There are features in modern regular expression implementations that break the rules of the classic regular expression definition.

For example Microsoft’s .NET Balancing Group (?<name1-name2> … ):

^(?:0(?<L>)|1(?<-L>))*(?(L)(?!))$

This does match the language L₀₁ = {ε, 01, 0011, 000111, … }. But this language is not regular according to the Pumping Lemma.

查看更多
等我变得足够好
3楼-- · 2019-02-13 07:48

The way I see it:

  • Regular languages:
    • Matched by state machines. Only one variable can be used to represent the current "location" in the grammar to be matched: Recursion cannot be implemented
  • Context-free languages:
    • Matched by a stack machine. The current "location" in the grammar is represented by a stack in one or another form. Cannot "remember" anything that occurred before
  • Context-sensitive languages:
    • Most programming languages
    • All Most human languages

I do know of regular expression parsers that allow you to match against something the parser has already encountered, achieving something like a context-sensitive grammar.

Still, regular expression parsers, however sophisticated they may be, don't allow for recursive application of rules, which is a definite requirement for context-free grammars.

The term regex, in my opinion, mostly refers to the syntax used to express those regular grammars (the stars and question marks).

查看更多
女痞
4楼-- · 2019-02-13 08:01

In particular backreferences to capturing parentheses make regular expressions more complex than regular, context-free, or context-sensitive grammars. The name is simply historically grown (as many words). See also this section in Wikipedia and this explanation with an example from Perl.

查看更多
登录 后发表回答