This is pseudo homework (it's extra credit). I've got a BST which is an index of words that point to the lines (stored somewhere else) that contain the words. I need to implement a way to search using s-expressions so I can combine and (&) and or (|).
At the command prompt a user could type something like:
QUERY ((((fire)&(forest))|((ocean)&(boat)))&(water))
Essentially that should return all lines that contain the words fire, forest and water as well as all lines that contain ocean, boat and water.
What I really need help with is the logic for parsing and inserting nodes into the tree to properly represent the expression more than the actual code. The only thing I have worked out that makes sense to me is returning a set of lines for each word in the expression. Then depending on if it's an "or" or "and" operation I would perform a union or intersection type operation on those sets to create a new set and pass that on up the tree.
I am kind of lost on how to parse the line that contains the expression. After some thought it appears that the "farther" out one of the sub-expressions is the higher it should be in my s-expression tree? I think if I could just get a push in the right direction as far as parsing and inserting the expressions in the tree I should be OK.
My sample tree that I came up with for the query above looks something like;
&
/ \
| water
/ \
& &
/ \ / \
fire forest ocean boat
This makes sense as fire would return a set of lines that all contain fire and forest would return a set of lines that all contain forest. Then at the "&" level I would take those two sets and create another set that contained only the lines that were in both sets thus giving me a set that only has lines which contain both fire and forest.
My other stumbling block is how to represent everything in the tree after I overcome the hurdle of parsing. I have an ExpTreeNode class that will serve as the nodes for my ExpTree(the BST) and then I have 2 subclasses, operator and operand, but I'm not sure if this is a good approach.