I have to write a program that takes a user's chemical equation as an input, like 12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2, and watch if the amount of Atoms is on both sites the same. Is there any way to calculate and parse this easily?
For example:
12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2
12*2+6*2 -> 2*6+2*12+2*6+12*2
In this case there should be the Output "false".
This is my code but it's actually is only to try out something:
public static void main(String[] args) {
Scanner s = new Scanner(System.in);
List<String> list = new ArrayList<String>();
String input = "";
while (!(input.equals("end"))) {
input = s.nextLine();
list.add(input);
}
list.remove(list.size() - 1);
for (int i = 0; i < list.size(); i++) {
int before = 0;
int after = 0;
String string = list.get(i);
string = besserUmwandeln(string);
System.out.println(string);
}
}
public static String besserUmwandeln(String string) {
string = string.replace("-", "");
string = string.trim().replaceAll(("\\s+"), " ");
string = string.replace(' ', '*');
StringBuilder builder = new StringBuilder(string);
System.out.println(string);
for (int k = 0; k < builder.length(); k++) {
if (Character.isUpperCase(builder.charAt(k))) {
builder.setCharAt(k, ':');
}
if (Character.isLowerCase(builder.charAt(k))) {
builder.setCharAt(k, '.');
}
if (Character.isDigit(builder.charAt(k))) {
} else {
}
}
for (int j = 0; j < builder.length(); j++) {
if (j < builder.length() && builder.charAt(j) == ':' && builder.charAt(j + 1) == '.') {
builder.deleteCharAt(j + 1);
}
}
for (int i = 0; i < builder.length(); i++) {
if (i < builder.length() - 1 && builder.charAt(i) == ':' && builder.charAt(i + 1) == ':') {
builder.deleteCharAt(i);
}
}
for (int i = 0; i < builder.length(); i++) {
if (i < builder.length() - 1 && builder.charAt(i) == '+' && builder.charAt(i + 1) == '*') {
builder.deleteCharAt(i + 1);
}
}
for (int i = 0; i < builder.length(); i++) {
if (i < builder.length() - 1 && builder.charAt(i) == '*' && builder.charAt(i + 1) == '+') {
builder.deleteCharAt(i);
}
}
for (int i = 0; i < builder.length(); i++) {
if (i < builder.length() - 1 && builder.charAt(i) == '*' && builder.charAt(i + 1) == '>') {
builder.deleteCharAt(i);
}
}
for (int i = 0; i < builder.length(); i++) {
if (i < builder.length() - 1 && builder.charAt(i) == '>' && builder.charAt(i + 1) == '*') {
builder.deleteCharAt(i + 1);
}
}
for (int i = 0; i < builder.length(); i++) {
if (i < builder.length() - 1 && builder.charAt(i) == '*' && builder.charAt(i + 1) == ':') {
builder.deleteCharAt(i + 1);
}
}
return builder.toString();
}
The way a proper parser such as ANTLR works is to 1) convert the text into a stream of lexical tokens, then 2) parse the tokens with lookahead into a parse tree.
Lookahead is useful to know when to "end" a particular structural level of parsing.
For your requirements, you might be able to skip the distinction between lexing and parsing and just parse from the text directly -- however, an appreciation and use of lookahead would potentially be useful.
In particular a buffer to hold the upcoming (remaining) text, test matches (eg regex) against it, and consume matches from the front could be useful. This could be implemented either by modifying the
remaining
string or by advancing an index within it.Given such a buffer, your pseudocode might look like:
This is conceptual example code, not tested & does not include the buffer or complete parser -- it is the reader's job to flesh these out to a complete solution.
This question is asking for a simple parser for a simple type of equation. I am assuming that you do not need to support all kinds of irregular equations with parentheses and weird symbols.
Just to be safe, I would use a lot of
String.split()
instead of regexes.A (relatively) simple solution would do the following:
->
+
Each level of parsing can be handily done in a separate method. Using regex is probably the best way to parse the individual molecules, so I borrowed the expression from here: https://codereview.stackexchange.com/questions/2345/simplify-splitting-a-string-into-alpha-and-numeric-parts. The regex is pretty much trivial, so please bear with me:
All the work is done by the
parse
method and it's subordinates, which make a sort of virtual call tree. Since this approach makes it especially easy to make sure that the atoms of each element are actually balanced out, I have gone ahead and done that here. This class prints the counts of the atoms on each side of the equation, whether or not the raw counts balance out, as well as whether or not they match my element type. Here are a couple of example runs:OP's original example:
Added Ozone to make the number of atoms match up
Added water to make everything match up
Notice that I added a space between
C
andO
inCO2
. This is because my current regex for molecules,([a-zA-Z]+)\\s*([0-9]*)
, allows any combination of letters to represent an element. If your elements are always going to be simple one-letter elements, change this to([a-zA-Z])\\s*([0-9]*)
(remove the+
quantifier). If they are going to be properly named, two letter combinations with the second letter always lowercase, do this instead:([A-Z][a-z]?)\\s*([0-9]*)
. I recommend the latter option. For both modified versions, the space inC O2
will no longer be necessary.So, every time I need to parse some text with
Java
, I mostly end up just usingRegex
. So I'd recommend you to also do so.You can test regular expressions at regex101.com.
And also easily use it in
Java
:Inside
Regex
you can define capturing groups with(
and)
and then grab the results bymatcher.group(int)
.For example, you may first separate the equation using
(.*) -> (.*)
.Then loop the left and right group using
find
with:(\d+) (\w+)(?: \+| -|$)
.After that you can use
group(1)
for the amount andgroup(2)
for the element.And if needed also iterate the second group (the element) for the exact element distribution using
(\w)(\d?)
. Then the first group is the element, for example for the textCO2
it yields two hits, the first hit hasgroup(1) -> C
and no second group. The second hit hasgroup(1) -> O
andgroup(2) -> 2
.Test your regex here: regex101#Q6KMJo