Convert the grammar below into Chomsky Normal Form. Give all the intermediate steps.
S -> AB | aB
A -> aab|lambda
B -> bbA
Ok so the first thing I did was add a new start variable S0
so now I have
S0 -> S
S -> AB | aB
A -> aab|lambda
B -> bbA
then I removed all of the lambda rules:
S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb
Then I checked for S->S
and A->B
type rules which did not exist. And that was the answer I came up with, do I need to do anything further or did I do anything wrong?
Wikipedia says:
Continuing your work:
Instead of using
|
to denote different choices, split a rule into multiple rules.Create new rules
Y -> a
andZ -> b
because we will need them soon.S -> aB
is not of the formS -> BC
becausea
is a terminal. So changea
intoY
:Do the same for the
B -> bb
rule:For
A -> aab
, createC -> YY
; forB -> bbA
, createD -> ZZ
:For
S -> B
, duplicate the one rule whereS
occurs on the right hand side and inline the rule:Deal with the rules
S0 -> B
andS0 -> S
by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):And we're done. Phew!
Alternative answer: The grammar can only produce a finite number of strings, namely 6.
You can now condense this back to Chomsky Normal Form by hand.
By substitution, we can find the set of all strings produced. Your initial rules:
First split up the
S
rule:Now substitute what A and B expand into:
Expand these again to get:
To change this finite set to Chomsky Normal Form, it suffices to do it by brute force without any intelligent factoring. First we introduce two terminal rules:
Now for each string, we consume the first letter with a terminal variable and the remaining letters with a new variables. For example, like this:
We just go through this process for all 6 strings, generating a lot of new intermediate variables.
Without getting into too much theory and proofs(you could look at this in Wikipedia), there are a few things you must do when converting a Context Free Grammar to Chomsky Normal Form, you generally have to perform four Normal-Form Transformations. First, you need to identify all the variables that can yield the empty string(lambda/epsilon), directly or indirectly - (Lambda-Free form). Second, you need to remove unit productions - (Unit-Free form). Third, you need to find all the variables that are live/useful (Usefulness). Four, you need to find all the reachable symbols (Reachable). At each step you might or might not have a new grammar. So for your problem this is what I came up with...
Context-Free Grammar
Remove lambda/epsilon
Remove unit produtions
Determine live symbols
Remove unreachable
Replace all mixed strings with solid nonterminals
Chomsky Normal Form