Documents lost between Logstash and ElasticSearch

2020-04-30 19:15发布

I'm running and basic ELK stack. All three components runnning in the vm. Logstash is listening on TCP 9140 for its input receiving from about 30 Windows Server 2008s and 30 Windows Server 2003 Events (evts) via NxLog agents and its outputting to elasticsearch.

This has been running for a couple of weeks perfectly. I could see ElasticSearch creating indices for each day and could browse documents, create graphs , all good.

After a weekend I realized that at some point of Friday at 9pm, all new events stopped. No network issues, all servers were shipping their logs All I could see was A LOT of error/warnings related to Watcher (elastic plugin) being out of license but nothing interesting. I even could see logs related to new indices for the new day being automatically created.

So I removed the plugin, restarted elastic and all good. I don't think that was the actual problem, I think elasticsearch was hung up.

I have two questions:

1) How should I trobleshoot these conditions. (all services up, no documents being indexed)?

2) If Logstash is Up and accepting inputs but Elastic is down , what happens to those shipped events from my windows servers? Since the Nxlog point of view, those logs were correctly sent to logstash there would be no reason to retry and those logs would be lost , "forever" ?

thanks! Rodrigo.

1条回答
Luminary・发光体
2楼-- · 2020-04-30 20:06

If ES is down or hung up for whatever reason (too busy GC-ing, etc), then logstash will retry a few times and then let it go, which means you'll lose those events.

There are many ways to alleviate this but a good practice is to durably store the events (either inside a DB or a message queueing system) and pop them out only when they have been successfully sent to ES. Such messaging systems include Redis (using lists or channels as queues), Apache Kafka (distributed log), Rabbitmq (distributed message queues), etc

There are plenty of ways to configure these technologies together with logstash, one example would be this one which shows Logstash being used with Kafka. In your case, this would mean that Nxlog ships its logs to kafka instead of directly to Logstash. Then Logstash consumes logs from a Kafka topic.

Your mileage will vary, but the main idea here is that your logs will not be lost in case they cannot be sent to Elasticsearch. This would answer your second question.

As for your first one, I would advise to install other ES plugins, like bigdesk and HQ and/or the official Marvel plugin which all provide deep insights into what is currently going on inside Elasticsearch. You'll be able to quickly detect where the issue/s is/are and take action.

查看更多
登录 后发表回答