Do all regular expressions halt?

2020-02-12 02:06发布

问题:

Is there any regular expression that will, for some input string, search for a match forever?

回答1:

For a finite input, there is no formal regular expression that will not halt.

Any formal regular expression can be translated into a Deterministic Finite Automata. A DFA reads the input one character at a time, and, at the end of the input, you are either in an accepting state or in a non-accepting state. If the state is accepting, then the input matches the regular expression. Otherwise, it doesn't.

Now, most "regular expression" libraries support things which are not regular expressions, such as back references. As long as you keep away from those features, and have a finite input, you are guaranteed a halt. If you don't... depending on exactly what you are using, you might well not be guaranteed a halt. Perl allows arbitrary code to be inserted, for instance, and arbitrary, turing-machine equivalent code is not guaranteed to halt.

Now, if the input is infinite, then trivial regular expressions can be found which will never halt. For example, ".*".



回答2:

Formal regex is actually a method of describing a deterministic finite automaton for parsing strings. The regex "matches" if the DFA winds up in an accepting state at the end of input. Since the DFA reads its input sequentially, it will always halt when it reaches the end of the input, and whether or not there is a match is merely a matter of examining which state of the DFA it halts at.

Substring matching is effectively the same, except instead of being forced to halt at the end of one read-through of the string, the DFA would instead be forced to halt after reading each possible substring once - still a finite case. (Yes, most regex engines implement this in a bit more optimized manner than just throwing every possible substring at a DFA - but conceptually it the limit is still there).

Thus the only possible case in which the DFA would not halt is if the input were infinite, which is generally considered beyond the scope of the halting problem.



回答3:

I imagine, it is not possible to find a regular expression that doesn't halt.

The size of your input is finite. The maximum size of any matched subgroup of the regular expression is, at max, the size of your input.

Unless the algorithm being used is pretty stupid (going over cases multiple times), the number of matched subgroups, will too, be finite.

So, it will halt.



回答4:

According to this question, every regular expression halts.



回答5:

Not in the sense you are describing, you can have some very inefficient regular expressions that take up loads of resources and end up killing the regex engine, this is not the same as halting.

I don't think halting really applies here, as the other commenters of this post have so astutely pointed out. http://en.wikipedia.org/wiki/Halting_problem



回答6:

I can't imagine an input string that would be parsed forever, although an infinitely long string would be parsed forever. Given that a regular expression can describe a regular language, which is potentially an infinite set of words, then a regex can describe a language of infinite words, including words of infinite length. However, no input string can be infinitely long, so at some point it would have to halt.

For example, if a*b is accepted in the language, and you have an infinitely long string of 'a's, then yes, the regex would never halt. Practically, though, this is impossible.



回答7:

Yes.

A regular expression can be represented by a finite state machine. Every time you receive an atomic input, it will cause any well-defined FSM to transition to a known state.

The exception is when you have infinite input, but this is not applicable to the halting problem because it deals with finite input. When you have a finite state machine and finite input, it is always possible to determine whether your machine will halt or not.

http://en.wikipedia.org/wiki/Finite_state_machine



回答8:

+1 for Daniel's answer: all finite inputs cause true regex's (i.e., without backreferences or other non-regex features) to halt, and regex's are equivalent to DFAs.

Bonus: Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

http://swtch.com/~rsc/regexp/regexp1.html

Note that the two graphs at the top of the article have different scales on the y-axis: one is seconds, the other is microseconds!