I have a server that sends access logs over to logstash in a custom log format, and am using logstash to filter these logs and send them to Elastisearch.
A log line looks something like this:
0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n
And gets parsed using this grok filter:
grok {
match => [
"message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - %{NUMBER:content_length} %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} %{USER:remote_user} %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}",
"message", "%{IP:remote_host} - %{WORD:method} %{URIPATHPARAM:requested_uri} %{NUMBER:status_code} - - %{NUMBER:elapsed_time:int} ms %{GREEDYDATA:user_agent}"
]
add_field => {
"protocol" => "HTTP"
}
}
The final log gets parsed into this object (with real IPs stubbed out, and other fields taken out):
{
"_source": {
"message": " 0.0.0.0 - GET / 200 - 29771 3 ms ELB-HealthChecker/1.0\n",
"tags": [
"bunyan"
],
"@version": "1",
"host": "0.0.0.0:0000",
"remote_host": [
"0.0.0.0",
"0.0.0.0"
],
"remote_user": [
"-",
"-"
],
"method": [
"GET",
"GET"
],
"requested_uri": [
"/",
"/"
],
"status_code": [
"200",
"200"
],
"content_length": [
"29771",
"29771"
],
"elapsed_time": [
"3",
3
],
"user_agent": [
"ELB-HealthChecker/1.0",
"ELB-HealthChecker/1.0"
],
"protocol": [
"HTTP",
"HTTP"
]
}
}
Any ideas why I am getting multiple matches per log? Shouldn't Grok be breaking on the first match that successfully parses?
Chances are you have multiple config files that are being loaded. If you look at the output, specifically the
elapsed_time
shows up as both an integer and a string. From the config file you've provided, that's not possible since you have:int
on anything that matcheselapsed_time
.