I have a json in the form of
[
{
"foo":"bar"
}
]
I am trying to filter it using the json filter in logstash. But it doesn't seem to work. I found that I can't parse list json using the json filter in logstash. Can someone please tell me about any workaround for this?
UPDATE
My logs
IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"
URL decoded version of the above log is
IP - - 0.000 0.000 [24/May/2015:06:51:13 0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"
Please find below my config file for the above logs..
filter {
urldecode{
field => "message"
}
grok {
match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}
kv {
field_split => "&? "
}
json{
source=> "events"
}
geoip {
source => "clientip"
}
}
I need to parse the events, ie events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]
I assume that you have your json in a file. You are right, you cannot use the json filter directly. You'll have to use the multiline codec and use the json filter afterwards.
The following config works for your given input. However, you might have to change it in order to properly separate your events. It depends on your needs and the json format of your file.
Logstash config:
input {
file {
codec => multiline
{
pattern => "^\]" # Change to separate events
negate => true
what => previous
}
path => ["/absolute/path/to/your/json/file"]
start_position => "beginning"
sincedb_path => "/dev/null" # This is just for testing
}
}
filter {
mutate {
gsub => [ "message","\[",""]
gsub => [ "message","\n",""]
}
json { source => message }
}
UPDATE
After your update I guess I've found the problem. Apparently you get a jsonparsefailure because of the square brackets. As a workaround you could manually remove them. Add the following mutate filter after your kv and before your json filter:
mutate {
gsub => [ "events","\]",""]
gsub => [ "events","\[",""]
}
UPDATE 2
Alright, assuming your input looks like this:
[{"foo":"bar"},{"foo":"bar1"}]
Here are 4 options:
Option a) ugly gsub
An ugly workaround would be another gsub:
gsub => [ "event","\},\{",","]
But this would remove the inner relations so I guess you don't want to do that.
Option b) split
A better approach might be to use the split filter:
split {
field => "event"
terminator => ","
}
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
json{
source=> "event"
}
This would generate multiple events. (First with foo = bar
and second with foo1 = bar1
.)
Option c) mutate split
You might want to have all the values in one logstash event. You could use the mutate => split filter to generate an array and parse the json if an entry exists. Unfortunately you will have to set a conditional for each entry because logstash doesn't support loops in its config.
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
split => [ "event", "," ]
}
json{
source=> "event[0]"
target => "result[0]"
}
if 'event[1]' {
json{
source=> "event[1]"
target => "result[1]"
}
if 'event[2]' {
json{
source=> "event[2]"
target => "result[2]"
}
}
# You would have to specify more conditionals if you expect even more dictionaries
}
Option d) Ruby
According to your comment I tried to find a ruby way. Following works (after your kv filter):
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
ruby {
init => "require 'json'"
code => "
e = event['event'].split(',')
ary = Array.new
e.each do |x|
hash = JSON.parse(x)
hash.each do |key, value|
ary.push( { key => value } )
end
end
event['result'] = ary
"
}
Option e) Ruby
Use this approach after your kv filter (without setting a mutate filter):
ruby {
init => "require 'json'"
code => "
event['result'] = JSON.parse(event['event'])
"
}
It will parse events like event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]
into:
"result" => [
[0] {
"name" => "Alex",
"address" => "NewYork"
},
[1] {
"name" => "David",
"address" => "NewJersey"
}
Since the behavior of the kv filter this does not support whitespaces. I hope you don't have any in your real inputs, do you?