Parsing and building S-Expressions using Sets and

2020-08-07 02:37发布

问题:

This is pseudo homework (it's extra credit). I've got a BST which is an index of words that point to the lines (stored somewhere else) that contain the words. I need to implement a way to search using s-expressions so I can combine and (&) and or (|).

At the command prompt a user could type something like:

QUERY ((((fire)&(forest))|((ocean)&(boat)))&(water))

Essentially that should return all lines that contain the words fire, forest and water as well as all lines that contain ocean, boat and water.

What I really need help with is the logic for parsing and inserting nodes into the tree to properly represent the expression more than the actual code. The only thing I have worked out that makes sense to me is returning a set of lines for each word in the expression. Then depending on if it's an "or" or "and" operation I would perform a union or intersection type operation on those sets to create a new set and pass that on up the tree.

I am kind of lost on how to parse the line that contains the expression. After some thought it appears that the "farther" out one of the sub-expressions is the higher it should be in my s-expression tree? I think if I could just get a push in the right direction as far as parsing and inserting the expressions in the tree I should be OK.

My sample tree that I came up with for the query above looks something like;

                                            &
                                         /     \
                                       |       water
                                   /      \
                                 &          &
                               /   \        /   \
                            fire  forest  ocean boat

This makes sense as fire would return a set of lines that all contain fire and forest would return a set of lines that all contain forest. Then at the "&" level I would take those two sets and create another set that contained only the lines that were in both sets thus giving me a set that only has lines which contain both fire and forest.

My other stumbling block is how to represent everything in the tree after I overcome the hurdle of parsing. I have an ExpTreeNode class that will serve as the nodes for my ExpTree(the BST) and then I have 2 subclasses, operator and operand, but I'm not sure if this is a good approach.

回答1:

Dijkstra has done it for you already :-)

Try the shunting yard algorithm: http://en.wikipedia.org/wiki/Shunting-yard_algorithm

You can create the RPN (reverse polish notation) using the shunting yard algorithm, and once that is created, you can make a pass through it to create the binary tree.

Normally, the RPN is used to do the evaluation, but you can actually create a tree.

For instance, instead of evaluating, you create tree nodes and push them onto the stack.

So if you see node1, node2 , operator. You create a new node

   Operator
   /     \
  node1   node2

and push it back onto the stack.

A more detailed example:

Say the expression is (apples AND oranges) OR kiwis

THe RPN for this is kiwis oranges apples AND OR

Now walk this while maintaining a stack.

Make a node out of kiwis push onto stack. Node out of oranges push onto stack. Same with apples.

So The stack is

Node:Apples
Node:Oranges
Node:Kiwis

Now you see the AND in the RPN.

You pop the top two from the stack and create a new Node with AND as parent.

Node:AND, [Node:Apples, Node:Oranges]

basically the tree

       AND
     /    \
  Apples  Oranges

Now push this node onto stack.

So stack is

Node:AND, [Node:Apples, Node:Oranges]
Node:Kiwis

Now you see the OR in the RPN and create a node with OR as parent and Node:ANd and Node Kiwis as children getting the tree

           OR 
         /   \
       AND   Kiwis
     /    \
  Apples  Oranges

You might even be able to modify the shunting yard algorithm to create the tree, but dealing with the RPN seems easier.

Alternately, you can try using Recursive Descent Parsing techniques. What you ask is very common and you will be able to find grammars and code even, if you search the web.

By the way, you just mean Binary tree right? BST (Binary Search Tree) has an extra constraint...