This is a cross-post of https://groups.google.com/d/topic/google-appengine/97LY3Yfd_14/discussion
I'm working with the new full text search service in gae 1.6.6 and I'm having trouble figuring out how to correctly escape my query strings before I pass them off to the search index. The docs mention that certain characters need to be escaped (namely the numeric operators), however they don't specify how the query parser expects the string to be escaped.
The issue I'm having is two-fold:
- Failing to escape the crap out of many characters (more than those that are hinted at in the docs) will cause the parser to raise a
QueryException
. - When I've escaped the query to the point it won't raise, the numeric operators (>, <, >=, <=) no longer parse correctly (not factored into the search).
I setup a test where I feed string.printable
into my_index.search()
and found that it would raise QueryException
on each of the "printable" control characters, which I'm now stripping out, as well as things that would seem innocent like asterisk, comma, parenthesis, braces, tilde. None of these are mentioned in the docs as needing to be escaped.
So far I've tried:
cgi.escape()
saxutils.escape()
with a mapping of ascii to urlencoded equivalents (eg,
->%2C
)saxutils.escape()
with a mapping of ascii to html entity encoded ascii codes (eg{
)urllib.quote_plus()
I've gotten the best results so far using url-style(%NN
) replacements, but >, <, >=, and <= continue to fail to yield the expected results from the index.
Also, and this doesn't really seem to have anything to do with the escaping issue, but using NOT
in front of a field = value
type query seems to not be working as advertised either.
tl;dr
How should I be escaping my queries before sending them to the search service so that the parser doesn't raise QueryException
and my query yields expected results?
as briefly explained in the documentation (https://developers.google.com/appengine/docs/python/search/overview#Query_Language_Overview), the query parameter is a string that should conform our query language. Which we should document better.
For now, I recommend you to wrap your queries (or at least some of the words/terms) in double quotes. In that way you would be able to pass all printable characters, but " and . The following example shows the result.
and you could even pass non printable characters
EDIT: Note that anything that is enclosed in double quotes is an exact match, that is "foo bar" would match against ...foo bar... but no ...bar foo..