I have built a lexical analyzer for a C like language which for example given this input produces the following result.
Input
int i = 0 ; int j = i + 3;
Output
int KEYWORD
i IDENTIFIER
= OPERATOR
; PUNCTUATION
int KEYWORD
j IDENTIFIER
= OPERATOR
i IDENTIFIER
+ OPERATOR
3 INTEGER_CONSTANT
; PUNCTUATION
In the above example you may have noticed the given input was syntactically correct, however when I give it something like below it fails.
Input
int i = "1.2.2222.+\<++++
I have made a class whose sole purpose is to break the above string into small parts (i call them literals , don't know if it is the correct term)that can be matched with regex or validated with DFA.
Problem arises with the ambiguous situations like + where + can either be an addition operator, or a part of an upcoming integer literal or even part of an increment operator. My teacher requirement is explained in the next paragraph.
if a + is preceded by a + it should be processed as an increment operator. In simple words the program must try to look for every possibility and choose the best. That means if the program has some valid input then some invalid input the again some valid input it should not stop at that invalid input instead keep finding the correct literals. For me though I am against it. My argument is if a program string becomes invalid at a certain index it should stop processing because we are not writing an error checking system after all.
I have tried to code all possibilities using a complex (for me) nested if else structure and gotten partial success. Can nay of you suggest me a simpler and elegant solution. I have also thought of structuring this problem into a state machine but I am not too sure because I have never implemented a state machine before other than the a DFA that can just tell yes or no for pattern matching.
As you can see it is a homework question but I am not asking for just code.