MarkLogic 9 cts.parse not parsing queries correctl

2019-07-18 19:03发布

I am working on a web-based search application using MarkLogic 9. I have a query building interface that allows users to enter strings into textboxes that correspond to particular JSON properties of the documents in the db. The idea was that the user could enter the search terms exactly as the cts.parse (I use server side javascript, not XQuery) expects them, so that their searches could be arbitrarily complex and I would not have to deal with parsing the queries myself. However after doing some testing, I have discovered an odd phenomena regarding the use of parentheses in the Boolean logic. Namely, when you include parentheses with a statement like cat and (dog OR bird), cts.parse will mistake the OR for a search term.

I will provide an actual example from my website:

I have constructed a bindings object to bind the queries to the elements of my documents,

var qOpts = ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded"];


var bindings = {
	main: function(operator, values, options){
		return(
				cts.orQuery([
					cts.jsonPropertyWordQuery('title',values,qOpts),
					cts.jsonPropertyWordQuery('abstract',values,qOpts),
					cts.jsonPropertyWordQuery('meshterms',values,qOpts),
					])
		);
	},	
}

My server-side scripts call, for example,

cts.parse('main:'+params.mainQuery,bind)

Here are some examples of the strings entered and the queries returned:

  1. brain OR heart OR lung

cts.orQuery([cts.jsonPropertyWordQuery("title", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("abstract", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("meshterms", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.wordQuery("heart", ["lang=en"], 1), cts.wordQuery("lung", ["lang=en"], 1)], [])

This one properly generates the jsonPropertyWordQuery for the 3 fields (title,abstract, mesh terms) for the "brain" term, but fails to do so for the other two terms, for which it simply generates a cts.wordQuery().

  1. brain OR heart AND lung

cts.orQuery([cts.jsonPropertyWordQuery("title", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("abstract", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("meshterms", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.andQuery([cts.wordQuery("heart", ["lang=en"], 1), cts.wordQuery("lung", ["lang=en"], 1)], ["unordered"])], [])

  1. brain OR (heart AND lung)

cts.orQuery([cts.jsonPropertyWordQuery("title", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("abstract", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("meshterms", "brain", ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.andQuery([cts.wordQuery("heart", ["lang=en"], 1), cts.wordQuery("lung", ["lang=en"], 1)], ["unordered"])], [])

2 and 3 appear to be the same. The first part correct generates a jsonPropertyWordQuery but the other terms are anded as basic word queries, which I am trying to avoid.

  1. (brain OR heart) AND lung

cts.andQuery([cts.orQuery([cts.jsonPropertyWordQuery("title", ["brain", "OR", "heart"], ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("abstract", ["brain", "OR", "heart"], ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1), cts.jsonPropertyWordQuery("meshterms", ["brain", "OR", "heart"], ["case-insensitive","punctuation-insensitive","whitespace-insensitive","wildcarded","lang=en"], 1)], []), cts.wordQuery("lung", ["lang=en"], 1)

Here, the parser does not seem to recognize that OR is an operator because, even though it is correctly generating jsonPropertyWordQueries, it is including OR as a term in the search.

Honestly, I am having trouble finding any query that is correctly, which leads me to believe that I must be doing something wrong. I have no idea where that could be. Am I misusing cts.parse or the bindings object?

Any help would be greatly appreciated.

2条回答
狗以群分
2楼-- · 2019-07-18 19:31

As also explained by Mary you should read main:cat OR dog as (main:cat) OR dog. You could do something like Erik suggests, and rewrite the query before parsing to main:cat OR main:dog, or you could parse cat OR dog (without the prefix), and then post-process the cts:query tree to expand occurrences of cts:word-query's with the three-some or-query's you're after. A recursive function using a typeswitch should do the trick for that.

HTH!

查看更多
家丑人穷心不美
3楼-- · 2019-07-18 19:34

It isn't clear to me what you exact query string is.

If the query string is something like "main:(cat OR dog)" then the OR is not a keyword in that context. What is allowed after a tag is pretty limited, and is not the full range of query language, it it just a list of literals.

If the query string is something like "main:cat OR dog then the scope of the tag is just cat.

It isn't unreasonable to want the () after a tag to scope an entire query now you can bind a function to the tag (it made no sense when it was fixed to a range index or field), but that isn't how the grammar is set up.

So you'll just have to do things piecemeal: main:cat OR main:dog

Or: given the set of values passed into your function, space-concatenate them and pass that to a separate call to cts:parse to get them interpreted as a query you can wrap.

查看更多
登录 后发表回答