Is it possible to add custom metadata to a Lucene

2019-04-13 17:42发布

问题:

I've come to the point where I need to store some additional data about where a particular field comes from in my Lucene.Net index. Specifically, I want to attach a guid to certain fields of a document when the field is added to the document, and retrieve it again when I get the document from a search result.

Is this possible?

Edit: Okay, let me clarify a bit by giving an example.

Let's say I have an object that I want to allow the user to tag with custom tags like "personal", "favorite", "some-project". I do this by adding multiple "tag" fields to the document, like so:

doc.Add( new Field( "tag", "personal" ) );
doc.Add( new Field( "tag", "favorite" ) );

The problem is I now need to record some meta data about each individual tag itself, specifically a guid representing where that tag came from (imagine it as a user id). Each tag could potentially have a different guid, so I can't simply create a "tag-guid" field (unless the order of the values is preserved---see edit 2 below). I don't need this metadata to be indexed (and in fact I'd prefer it not to be, to avoid getting hits on metadata), I just need to be able to retrieve it again from the document/field.

doc.GetFields( "tag" )[0].Metadata...

(I'm making up syntax here, but I hope my point is clear now.)

Edit 2: Since this is a completely different question, I've posted a new question for this approach: Is the order of multi-valued fields in Lucene stable?

Okay let's try another approach... The key problem area is the indeterminacy of the multiple field values under the same field name (e.g. "tag"). If I could introduce or obtain some kind of determinacy here, I might be able to store the metadata in another field.

For example, if I could rely on the order of the values of the field never changing, I could use an index in the set of values to identify exactly which tag I am referring to.

Is there any guarantee that the order I add the values to a field will remain the same when I retrieve the document at a later time?

回答1:

Depending on your search requirements for this index, this may be possible. That way you can control the order of fields. It would require updating both fields as the tag list changes of course, but the overhead may be worth it.

doc.Add(new Field("tags", "{personal}|{favorite}")); 
doc.Add(new Field("tagsref", "{1234}|{12345}")); 

Note: using the {} allows you to qualify your search for uniqueness where similar values exist.

Example: If values were stored as "person|personal|personage" searching for "person" would return a document that has any one of person, personal or personage. By qualifying in curly brackets like so: "{person}|{personal}|{personage}", I can search for "{person}" and be sure it won't return false positives. Of course, this assumes you don't use curly brackets in your values.



回答2:

I think you're asking about payloads.

Edit: From your use case, it sounds like you have no desire to use this metadata in your search, you just want it there. (Basically, you want to use Lucene as a database system.)

So, why can't you use a binary field?

ExtraData ed = new ExtraData { Tag = "tag", Type = "personal" };
byte[] byteData = BinaryFormatter.Serialize(ed); // this isn't the correct code, but you get the point
doc.Add(new Field("myData", byteData, Field.Store.YES));

Then you can deserialize it on retrieval.