I have an ElasticSearch object with these fields:
[Keyword]
public List<string> Tags { get; set; }
[Text]
public string Title { get; set; }
And, before I used to get the top Tags, in all the documents, using this code:
var Match = Driver.Search<Metadata>(_ => _
.Query(Q => Q
.Term(P => P.Category, (int)Category)
&& Q.Term(P => P.Type, (int)Type))
.FielddataFields(F => F.Fields(F1 => F1.Tags, F2 => F2.Title))
.Aggregations(A => A.Terms("Tags", T => T.Field(F => F.Tags)
.Size(Limit))));
But with Elastic 5.1, I get an error 400 with this hint:
Fielddata is disabled on text fields by default. Set fielddata=true on [Tags] in order to load fielddata in memory by uninverting the inverted index.
Then the ES documentation about parameter mapping tells you "It usually doesn’t make sense to do so" and to "have a text field for full text searches, and an unanalyzed keyword field with doc_values enabled for aggregations".
But the only doc with this is for 5.0, and the same page for 5.1 seem to not exist.
Now, 5.1 has a page about Term Aggregation that seems to cover what I need, but there is absolutely nothing to be found in C# / Nest that I can use.
So, I'm trying to figure out how I can just get the top words, across all documents, from the Tags (where each tag is its own word; for example "New York" is not "New" and "York") and the title (where each word is its own thing) in C#.
I need to edit this post because there seems to be a deeper problem. I wrote some test code that illustrates the issue:
Let's create a simple object:
public class MyObject
{
[Keyword]
public string Id { get; set; }
[Text]
public string Category { get; set; }
[Text(Fielddata = true)]
public string Keywords { get; set; }
}
create the index:
var Uri = new Uri(Constants.ELASTIC_CONNECTIONSTRING);
var Settings = new ConnectionSettings(Uri)
.DefaultIndex("test")
.DefaultFieldNameInferrer(_ => _)
.InferMappingFor<MyObject>(_ => _.IdProperty(P => P.Id));
var D = new ElasticClient(Settings);
fill the index with random stuff:
for (var i = 0; i < 10; i++)
{
var O = new MyObject
{
Id = i.ToString(),
Category = (i % 2) == 0 ? "a" : "b",
Keywords = (i % 3).ToString()
};
D.Index(O);
}
and do the query:
var m = D.Search<MyObject>(s => s
.Query(q => q.Term(P => P.Category, "a"))
.Source(f => f.Includes(si => si.Fields(ff => ff.Keywords)))
.Aggregations(a => a
.Terms("Keywords", t => t
.Field(f => f.Keywords)
.Size(Limit)
)
)
);
It fails the same way as before, with a 400 and:
Fielddata is disabled on text fields by default. Set fielddata=true on [Keywords] in order to load fielddata in memory by uninverting the inverted index.
but Fielddata is set to true on [Keywords], yet it keeps complaining about it.
so, let's get crazy and modify the class this way:
public class MyObject
{
[Text(Fielddata = true)]
public string Id { get; set; }
[Text(Fielddata = true)]
public string Category { get; set; }
[Text(Fielddata = true)]
public string Keywords { get; set; }
}
that way everything is a Text and everything has Fielddata = true.. well, same result.
so, either I am really not understanding something simple, or it's broken or not documented :)
It's less common that you want Fielddata; for your particular search here where you want to return just the tags and the title fields from the search query, take a look at using Source Filtering for this
Fielddata needs to uninvert the inverted index into an in memory structure for aggregations and sorting. Whilst accessing this data can be very fast, it can also consume a lot of memory for a large data set.
EDIT:
Within your edit, I don't see anywhere where you create the index and explicitly map your
MyObject
POCO; without explicitly creating the index and mapping the POCO, Elasticsearch will automatically create the index and infer the mapping forMyObject
based on the first json document that it receives, meaningKeywords
will be mapped as atext
field with akeyword
multi_field and Fielddata will not be enabled on thetext
field mapping.Here's an example to demonstrate it all working
This returns
You might also consider mapping
Keywords
as atext
field with akeyword
multi_field, using thetext
field for unstructured search and thekeyword
for sorting, aggregations and structured search. This way, you get the best of both worlds and don't need to enable Fielddatathen in search use