ElasticSearch Painless script: How to iterate in a

2020-06-08 19:22发布

问题:

I am trying to create a script using the script_score of the function_score. I have several documents whose rankings field is type="nested". The mapping for the field is:

"rankings": {
        "type": "nested",
        "properties": {
          "rank1": {
            "type": "long"
          },
          "rank2": {
            "type": "float"
          },
          "subject": {
            "type": "text"
          }
        }
      }

A sample document is:

"rankings": [
{
    "rank1": 1051,
    "rank2": 78.5,
    "subject": "s1"
},
{
    "rank1": 45,
    "rank2": 34.7,
    "subject": "s2"
}]

What I want to achieve is to iterate over the nested objects of rankings. Actually, I need to use i.e. a for loop in order to find a particular subject and use the rank1, rank2 to compute something. So far, I use something like this but it does not seem to work (throwing a Compile error):

"function_score": {
"script_score": {
    "script": {
        "lang": "painless",
        "inline": 
                 "sum = 0;"
                 "for (item in doc['rankings_cug']) {"
                     "sum = sum + doc['rankings_cug.rank1'].value;"
                 "}"
         }
    }
}

I have also tried the following options:

  1. for loop using : instead of in: for (item:doc['rankings']) with no success.
  2. for loop using in but trying to iterate over a specific element of the object, i.e. the rank1: for (item in doc['rankings.rank1'].values), which actually compile but it seems that it finds a zero-length array of rank1.

I have read that _source element is the one which can return JSON-like objects, but as far as I found out it is not supported in Search queries.

Can you please give me some ideas of how to proceed with that?

Thanks a lot.

回答1:

You can access _source via params._source. This one will work:

PUT /rankings/result/1?refresh
{
  "rankings": [
    {
      "rank1": 1051,
      "rank2": 78.5,
      "subject": "s1"
    },
    {
      "rank1": 45,
      "rank2": 34.7,
      "subject": "s2"
    }
  ]
}

POST rankings/_search

POST rankings/_search
{
  "query": {
    "match": {
      "_id": "1"
    }
  },
  "script_fields": {
    "script_score": {
      "script": {
        "lang": "painless",
        "inline": "double sum = 0.0; for (item in params._source.rankings) { sum += item.rank2; } return sum;"
      }
    }
  }
}

DELETE rankings


回答2:

Unfortunately, ElasticSearch scripting in general does not support the ability to access nested documents in this way (including Painless). Perhaps, consider a different structure to your mappings where rankings are stored in multi-valued fields if you need to be able to iterate across them in such a way. Ultimately, the nested data will need to de-normalized and put into the parent documents to be able to gets scores in the way described here.



回答3:

For Nested objects in an array, iterated over the items and it worked. Following is my sample data in elasticsearch index:

{
  "_index": "activity_index",
  "_type": "log",
  "_id": "AVjx0UTvgHp45Y_tQP6z",
  "_version": 4,
  "found": true,
  "_source": {
    "updated": "2016-12-11T22:56:13.548641",
    "task_log": [
      {
        "week_end_date": "2016-12-11",
        "log_hours": 16,
        "week_start_date": "2016-12-05"
      },
      {
        "week_start_date": "2016-03-21",
        "log_hours": 0,
        "week_end_date": "2016-03-27"
      },
      {
        "week_start_date": "2016-04-24",
        "log_hours": 0,
        "week_end_date": "2016-04-30"
      }
    ],
    "created": "2016-12-11T22:56:13.548635",
    "userid": 895,
    "misc": {

    },
    "current": false,
    "taskid": 1023829
  }
}

Here is the "Painless" script to iterate over nested objects:

{
  "script": {
    "lang": "painless",
    "inline": 
        "boolean contains(def x, def y) {
          for (item in x) {
            if (item['week_start_date'] == y){
              return true
            }
          }
          return false 
         }
         if(!contains(ctx._source.task_log, params.start_time_param) {
           ctx._source.task_log.add(params.week_object)
         }",
         "params": {
            "start_time_param": "2016-04-24",
             "week_object": {
               "week_start_date": "2016-04-24",
               "week_end_date": "2016-04-30",
               "log_hours": 0
              }
          }
  }
}

Used above script for update: /activity_index/log/AVjx0UTvgHp45Y_tQP6z/_update In the script, created a function called 'contains' with two arguments. Called the function. The old groovy style: ctx._source.task_log.contains() will not work since ES 5.X stores nested objects in a separate document. Hope this helps!`