Python elasticsearch DSL aggregation/metric of nes

2019-07-26 05:43发布

问题:

I'm trying to find the minimum (smallest) value in a 2-level nesting (separate minimum value per document).

So far I'm able to make an aggregation which counts the min value from all the nested values in my search results but without separation per document.

My example schema:

class MyExample(DocType):
    myexample_id = Integer()
    nested1 = Nested(
        properties={
            'timestamp': Date(),
            'foo': Nested(
                properties={
                    'bar': Float(),
                }
            )
        }
    )
    nested2 = Nested(
        multi=False,
        properties={
            'x': String(),
            'y': String(),
        }
    )

And this is how I'm searching and aggregating:

from elasticsearch_dsl import Search, Q

search = Search().filter(
    'nested', path='nested1', inner_hits={},
    query=Q(
        'range', **{
            'nested1.timestamp': {
                'gte': exampleDate1,
                'lte': exampleDate2
            }
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'x'},
    query=Q(
        'term', **{
            'nested2.x': x
        }
    )
).filter(
    'nested', path='nested2', inner_hits={'name': 'y'},
    query=Q(
        'term', **{
            'nested2.y': y
        }
    )
)

search.aggs.bucket(
    'nested1', 'nested', path='nested1'
).bucket(
    'nested_foo', 'nested', path='nested1.foo'
).metric(
    'min_bar', 'min', field='nested1.foo.bar'
)

Basically what I need to do is to get the min value for all the nested nested1.foo.bar values for each unique MyExample (they have unique myexample_id field)

回答1:

If you want minimum value per document then put all the nested buckets within a bucket terms aggregation over myexample_id field:

search.aggs..bucket(
  'docs', 'terms', field='myexample_id'
).bucket(
  'nested1', 'nested', path='nested1'
).bucket(
  'nested_foo', 'nested', path='nested1.foo'
).metric(
  'min_bar', 'min', field='nested1.foo.bar'
)

Note that this aggregation might be extremely expensive to calculate since it has to create a bucket for each document. For a use case like this it might be easier to compute the minimum on a per document basis as a script_field or in the app.