Phrase query in Lucene 6.2.0

2019-09-14 07:42发布

问题:

I have a document like this:

{ 
    "_id" : ObjectId("586b723b4b9a835db416fa26"), 
    "name" : "test", 
    "countries" : {
        "country" : [
            {
                "name" : "russia iraq"
            }, 
            {
                "name" : "USA china"
            }
        ]
    }
}

In MongoDB I am trying to retrieve it using phrase query(Lucene 6.2.0). My code looks as folllows:

StandardAnalyzer analyzer = new StandardAnalyzer();         

         // 1. create the index
            Directory index = new RAMDirectory();
            IndexWriterConfig config = new IndexWriterConfig(analyzer); 
            try {       

                 IndexWriter w = new IndexWriter(index, config);                    
                MongoClient client = new MongoClient("localhost", 27017);
                DB database = client.getDB("test123");
                DBCollection coll =  database.getCollection("test1");
                //MongoCollection<org.bson.Document> collection = database.getCollection("test1");
            DBCursor cursor = coll.find();                  
                    System.out.println(cursor);
                 while (cursor.hasNext()) { 
                     BasicDBObject obj = (BasicDBObject) cursor.next();

                      Document doc = new Document();
                BasicDBObject f = (BasicDBObject) (obj.get("countries"));
                                List<BasicDBObject> dts = (List<BasicDBObject>)(f.get("country"));   
                     doc.add(new TextField("id",obj.get("_id").toString().toLowerCase(), Field.Store.YES));
                     doc.add(new StringField("name",obj.get("name").toString(), Field.Store.YES));  
                   doc.add(new StringField("countries",f.toString(), Field.Store.YES));

                   for(BasicDBObject d : dts){
                      doc.add(new StringField("country",d.get("name").toString(), Field.Store.YES));
    //               
               }
                    w.addDocument(doc);                    

                 }
                 w.close();

and my search goes like :

 PhraseQuery query = new PhraseQuery("country", "iraq russia" );


                   // 3. search
                   int hitsPerPage = 10;
                   IndexReader reader = DirectoryReader.open(index);

                   IndexSearcher searcher = new IndexSearcher(reader);
                   TopDocs docs = searcher.search(query, hitsPerPage);
                   ScoreDoc[] hits = docs.scoreDocs;

                  //  4. display results
                   System.out.println("Found " + hits.length + " hits.");
                   for(int j=0;j<hits.length;++j) {
                       int docId = hits[j].doc;
                       Document d = searcher.doc(docId);
                       System.out.println(d);
                   }


                   reader.close();
    }
    catch (Exception e) {
            e.printStackTrace();
        } 

I am getting zero hits for this query. Can anyone tell what I am doing wrong? jars used: lucene-queries4.2.0 lucene-queryparser-6.2.1 lucene-analyzers-common-6.2.0

回答1:

i made certain changes which goes like:

Query query = new PhraseQuery.Builder()
                        .add(new Term("country", "iraq"))
                        .add(new Term("country", "russia"))
                        .setSlop(2)
                        .build();

and also i changed the type of feild while indexing :

for(BasicDBObject d : dts){
                  doc.add(newTextField("country",d.get("name").toString(), Field.Store.YES));

           }

But can anyone tell me the difference between StringFeild and TextFeild while indexing?



回答2:

Firstly, never mix Lucene versions. All your jars should be the same version. Upgrade lucene-queries to 6.2.1. In practice you might or might not run into trouble mixing up 6.2.0 and 6.2.1, but you definitely should upgrade lucene-analyzers-common as well.


PhraseQuery doesn't analyze for you, you have to add terms to it separately. In your example, "iraq russia" is treated as a single terms, rather than two separate (analyzed) terms.

It should look something like this:

Query query = new PhraseQuery.Builder()
    .add(new Term("country", "iraq"))
    .add(new Term("country", "russia"))
    .build();

If you want something that will analyze for you, you can use the QueryParser:

QueryParser parser = new QueryParser("country", new StandardAnalyzer())
Query query = queryparser.parse("\"iraq russia\"");