-->

Automata with kleene star

2019-07-20 15:50发布

问题:

Im learning about automata. Can you please help me understand how automata with Kleene closure works? Let's say I have letters a,b,c and I need to find text that ends with Kleene star - like ab*bac - how will it work?

回答1:

The question seems to be more about how an automaton would handle Kleene closure than what Kleene closure means.

With a simple regular expression, e.g., abc, it's pretty straightforward to design an automaton to recognize it. Each state essentially tells you where you are in the expression so far. State 0 means it's seen nothing yet. State 1 means it's seen a. State 2 means it's seen ab. Etc.

The difficulty with Kleene closure is that a pattern like ab*bc introduces ambiguity. Once the automaton has seen the a and is then faced with a b, it doesn't know whether that b is part of the b* or the literal b that follows it, and it won't know until it reads more symbols--maybe many more.

The simplistic answer is that the automaton simply has a state that literally means it doesn't know yet which path was taken.

In simple cases, you can build this automaton directly. In general cases, you usually build something called a non-deterministic finite automaton. You can either simulate the NDFA, or--if performance is critical--you can apply an algorithm that converts the NDFA to a deterministic one. The algorithm essentially generates all the ambiguous states for you.



回答2:

The Kleene star('*') means you can have as many occurrences of the character as you want (0 or more). a* will match any number of a's.

(ab)* will match any number of the string "ab"

If you are trying to match an actual asterisk in an expression, the way you would write it depends entirely on the syntax of the regex you are working with. For the general case, the backwards slash \ is used as an escape character:

\* will match an asterisk.

For recognizing a pattern at the end, use concatenation:

(a U b)*c* will match any string that contains 0 or more 'c's at the end, preceded by any number of a's or b's.

For matching text that ends with a Kleene star, again, you can have 0 or more occurrences of the string:

ab(c)* - Possible matches: ab, abc abcc, abccc, etc.

a(bc)* - Possible matches: a, abc, abcbc, abcbcbc, etc.



回答3:

Your expression ab*bac in English would read something like:

a followed by 0 or more b followed by bac

strings that would evaluate as a match to the regular expression if used for search

abac
abbbbbbbbbbac
abbac

strings that would not match

abaca //added extra literal
bac //missing leading a

As stated in the previous answer actually searching for a * would require an escape character which is implementation specific and would require knowledge of your language/library of choice.