Ambiguity in transition: How to process string in

2020-02-06 10:16发布

I have made DFA from a given regular expression to match the test string. There are some cases in which .* occurs. ( for example .*ab ) . Let say now the machine is in state 1. In the DFA, .* refers to the transition for all the characters to itself and another transition for a from the state 1 for 'a'. If test string contains 'a' then what could be the transition because from state 1, machine can go to two states that is not possible in DFA.

2条回答
Melony?
2楼-- · 2020-02-06 10:32

Matches with such regular expressions happen via backtracking. When there is an ambiguity about the next state, the evaluation takes the first choice and remembers it made the choice. If taking the first choice results in a failure to match, the evaluation backtracks to the last choice it made and tries the next available choice from that state.

I'm not sure such a mechanism maps to a strict DFA.

查看更多
叼着烟拽天下
3楼-- · 2020-02-06 10:37

I start with fundamental with your example so that one can find it helpful
Any class of automata can have two forms:

  • Deterministic
  • Non-Deterministic.

In Deterministic model: we only have single choice (or say no choice) to move from one congratulation to next configuration.
In Deterministic model of Finite Automate (DFA): for every possible combination of state (Q) and language symbol (Σ), we always have unique next state.

Definition of transition function for DFA: δ:Q×Σ → Q

δ(q0, a) → q1
            ^ only single choice

So, In DFA every possible move is definite from one state to next state.

Whereas,
In Non-Deterministic model: we can have more than one choice for next configuration.
And in Non-deterministaic model of Finite Automata (NFA): output is set of states for some combination of state (Q) and language symbol (Σ).

Definition of transition function for NFA: δ:Q×Σ → 2Q = ⊆ Q

δ(q0, a) → {q1, q2, q3}
               ^ is set, Not single state (more than one choice)

In NFA, we can have more then one choice for next state. That is you calls ambiguity in transition NFA.

(your example)
Suppose language symbols are Σ = {a, b} and the language/regular expression is (a + b)*ab. The finite automata for this language you down might be probably like below:

[autometa]
Your question is: Which state to move when we have more than one choices for next state?
I make it more general question.

How to process string in NFA?

I am considering automata model as an acceptor that accept a string if it belong to the language of automata.(Notice: we can have an automaton as a transducer), below is my answer with an above example

In above NFA, we find 5 tapular objects:

1.  Σ :  {a, b}  
2.  Q :  {q1, ,q2, q3}  
3.  q1:  initial state
4.  F :  {q3}       <---F is set of final state
5.  δ : Transition rules in above diagram: 
         δ(q1, a) → { q1, q2 }
         δ(q1, b) → { q1 }
         δ(q2, b) → { q3 } 

The exampled finite automata is an actually an NFA because in production rule δ(q1, a) → { q1, q2 }, if we get a symbol while present state is q1 then next states can be either q1 or q2 (more than one choices). So when we process a string in NFA, we get extra path to travel wherever their is a symbol a to be process while current state is q1.

A string is accepted by an NFA, if there is some sequence of possible moves that will put the machine in a final state at the end of string processing. And the set of all string those have some path to reach to any final state in set F from initial state is called language of NFA:

We can also write, "what is language defined by a NFA?" as:

L(nfa) = { w ⊆ Σ* | δ*(q1, w) ∩ F ≠ ∅}

when I was new, this was too complex to understand to me but its really not

L(nfa) says: all strings consists of language symbols = (w ⊆ Σ* ) are in language; if (|) the set of states get after processing of w form initial state (=δ*(q1, w) ) contains some states in the set of Final states (hence intersection with final states is not empty = δ*(q1, w) ∩ F ≠ ∅). So while processing a string in Σ*, we need to keep track of all the possible paths.

Example-1: to process string abab though above NFS:

--►(q1)---a---►(q1)---b---►(q1)---a---►(q1)---b---►(q1)
     \                       \ 
      a                       a  
       \                       \
        ▼                       ▼
       (q2)                     (q2)---b---►((q3))
        |                      
        b                      
        |                      
        ▼                                 
       (q3)                   
        |
        a
        |
       halt  

Above diagram show: How to process a string abab in NFA?

A halt: means string could not process completely so it can't be consider a accepted string in this path

String abab could process completely in two directions so δ*(q1, w) = { q1, q3}.

and intersection of δ*(q1, w) with set of final states is {q3}:

                  {q1, q3} ∩ F 
             ==>  {q1, q3} ∩ {q3}     
             ==>  {q3}  ≠  ∅ 

In this way, string ababa is in language L(nfa).

Example-2: String from Σ* is abba and following is how to process:

--►(q1)---a---►(q1)---b---►(q1)---b---►(q1)---a---►(q1)
     \                                   \ 
      a                                   a  
       \                                   \
        ▼                                   ▼
       (q2)                                (q2)
        |                      
        b                      
        |                      
        ▼                                 
       (q3)                   
        |
        b
        |
       halt    

For string abba set of reachable states is δ*(q1, w) = { q1, q2} and no state is final state in this set this implies => its intersection with F is ∅ a empty set, hence string abba is not an accepted string (and not in language).

This is the way we process a string in Non-deterministic Finite Automata.

Some additional important notes:

  • In case of finite automata's both Deterministic and Non-Deterministic models are equally capable. Non-Deterministic model doesn't have extra capability to define a language. hence scope of NFA and DFA are same that is Regular Language. (this is not case for all class of automate for example scope of PDA !=NPDA)

  • Non-deterministic models are more useful for theoretical purpose, comparatively essay to draw. Whereas for implementation purpose we always desire deterministic model (minimized for efficiency). And fortunately in class of finite autometa every Non-deterministic model can be converted into an equivalent Deterministic one. We have algorithmic method to convert an NFA into DFA.

  • An information represented by a single state in DFA, can be represented by a combination of NFA states, hence number of states in NFA are less than their equivalent DFA. (proof are available numberOfStates(DFA)<= 2 power numberOfStates(NFA) as all set combinations are powerset)

The DFA for above regular language is as below:

(a+b)*ab-dfa

Using this DFA you will always find a unique path from initial state to final state for any string in Σ* and instead of set you will gets to a single reachable final state and if that state belongs to set of final that input string is said to be accepted string (in language) otherwise not/

(your expression .*ab and (a + b)*ab are same usually in theoretical science we don't use . dot operator other then concatenation)

查看更多
登录 后发表回答