I wanted to make a simple parser, for a "pseudo code" like language(kept rigid), in Java.
A sample pseudo code would be -
//This is a comment
$x1 = readint
$x2 = readint
$dx = $x2 - $x1
#f = $dx / 2
if ($dx > 0)
{
loop while(#f > 1)
{
print(#f)
#f = #f / 2
}
}
Note that above code is rigid in that, there can not be more than one statement on a line, integers start with $, floats start with # etc.
To parse such code, first I can use StringTokenizer
, and then regular expression, to match integer-variables, float-variables, or Keywords.
Is this approach good? For statements in loop, how can i store expressions, so that i don't have to tokenize in each iteration?
I could think of converting expressions (like #f = #f / 2) to polish notation, and then to store in stack. And in each iteration, while popping operands I could replace value for each variable. But is this efficient enough?
Thanks in advance, for any suggestion.
Although I think that it's great that you want to build a parser for a language like this, doing so is much harder than it looks. Parsing is a very well-studied problem and there are many excellent algorithms that you can use, but they are extremely difficult to implement by hand. While you can use tricks like conversions to RPN for smaller examples like parsing expressions, building up a full programming language requires a much more complex set of tricks.
To parse a language of this complexity, you are probably best off using a parser generator rather than trying to write your own by hand. ANTLR and Java CUP are two well-known tools for doing precisely what you're interested in accomplishing, and I would strongly suggest using one of the two of them.
Hope this helps!
For simple languages (this is a judgement call, and if you are inexperienced you may not be able to make that call correctly), one can often write a recursive descent parser by hand that does well enough. The good news is that coding a recursive descent parser is pretty straightforward.
If you aren't sure, use overkill in the form of the strongest parser generator you can get.
in simple cases writing manually a parser makes sense.
However, using StringTokenizer is a indicator of doing it wrong, because a StringTokenizer IS already a SIMPLE parser.
a parser usually reads a char and changes its state depending on the value of that char.
Just a simple parser, a "b" make following char "uppercase", e to lowercase. "." stops
String input = "aDDbcDDeaaef.";
int pos = 0;
int state = 0;
while (pos < input.length()) {
char z = input.charAt (pos);
if (z == '.') break;
switch (z) {
case 'b': state = 1; break;
case 'e': state = 0; break;
default:
if (state == 0) {
System.out.print(Char.toLowerCase(z));
} else {
System.out.print(Char.toUpperCase(z));
}
}
pos ++;
}