So I'm using a standard ELK stack to analyse Apache access logs, which is working well, but I'm looking to break out URL parameters as fields, using the KV filter, in order to allow me to write better queries.
My problem is that that app I'm analysing has 'cache-busting' dynamically generated parameters, which leads to tens of thousands of 'fields', each occurring once. ElasticSearch seems have severe trouble with this and they have no value to me, so I'd like to remove them. Below is an example of the pattern
GET /page?rand123PQY=ABC&other_var=something
GET /page?rand987ZDQ=DEF&other_var=something
In the example above, the parameters I want to remove start 'rand'. Currently my logstash.conf uses grok to extract fields from the access logs, followed by kv to extract Query string parameters:
filter {
grok {
path => "/var/log/apache/access.log"
type => "apache-access"
}
kv {
field_split => "&?"
}
}
Is there a way I can filter out any fields matching the pattern rand[A-Z0-9]*=[A-Z0-9]*
? Most examples I've seen are targeting fields by exact name, which I cannot use. I did wonder about regexing the request field into a new field, running KV on that, then removing it. Would that work?