I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat". Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)?
相关问题
- Correctly parse PDF paragraphs with Python
- How to get a list of antonyms lemmas using Python,
- R: eval(parse()) error message: cannot ope
- Finding k smallest elements in a min heap - worst-
- binary search tree path list
相关文章
- What are the problems associated to Best First Sea
- How do I get from a type to the TryParse method?
- Coin change DP solution to keep track of coins
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
- Slow ANTLR4 generated Parser in Python, but fast i
- How to measure complexity of a string?
Not too sure but two approaches as per my experience with parsing -
Define a grammar which can parse the expression and collect values / parameters. You might want to come up with a dictionary of keywords using which you can then deduce the the type of search.
Be strict when defining your grammar so that the expression itself tells you about the type of search. eg LOC: A in B , VALUE $ to Euro. etc.
For parser see ANTLR / jcup & jflex.
You should write such linguistic rules in grammars such as GATE and http://code.google.com/p/graph-expression/. Examples: Token+ in (LocationLookup).
The problem you describe is called information extraction. A host of algorithms exist, the simplest being regexp matching, the best structured machine learning. Try regexps first and look at something like NLTK if you know Python.
Distinguishing "staples in NY" from "cat in hat" is possible if your program knows that "NY" is a location. You can tell either by the capitals or because "NY" occurs in a list called a gazetteer.
The problem in general is AI-complete, so expect to put in lots of hard work if you want good results.