I have thousands of logs files and it gets downloaded everyday. I am using logstash and ElasticSearch for parsing, indexing and searching.
Now I am using file input plugin for reading downloaded files and parsing it. I have not set sincedb_path
so its storing in $HOME
. But the problem is it reads log files for just one day. Here is my configuration for input:
input {
file {
path => "/logs/downloads/apacheLogs/env1/**/*"
type => "env1"
exclude => "*.gz"
start_position => "beginning"
}
file {
path => "/logs/downloads/appLogs/env2/**/*"
type => "env2"
exclude => "*.gz"
start_position => "beginning"
}
}
This is apparently caused by a bug in the File handler.
When File{} input method reads a log file, the last byte processed is being saved and periodically copied out to the sincedb
file. While you can set the file to be /dev/null
if you want, Logstash reads the file only during start up and uses the information from table in memory after.
The problem is that the table in memory indexes position by inode, and is never pruned, even if it detects that a given file no longer exists. If you delete a file and then add a new one -- even if it has a different name -- it may well have the same inode number, and the File handler will think it is the same file.
If the new file is larger, then the handler will only read from the previous max byte onwards and update the table. If the new file is smaller, then it seems to think the file was somehow truncated, and may start processing again from the default position.
As a result, the only way to handle things is to set sincedb
to be /dev/null
, and then restart logstash (causing the internal table to be lost) and then all the files matching the pattern will be re-read from the beginning - and this has problems as well, since some of the files may not be new.
Do logstash have any error message?
One possible problem is in linux system, there is limitation for number of opened files for every user. Logstash will open all the files which in the logstash input path(/logs/downloads/apacheLogs/env1/*/). Therefore, when your log files are over the limitation, logstash can't open any new log file and reads them. You can check your system settings (/etc/security/limits.conf).
Edit:
After you have modify the config file, you need to logout and login again.
This is because of "ignore_older". By default it is set to 86400 i.e, 1 day. So if the files that are present in the path provided are older than 1 day(not modified), it is ignored.
You can set ignore_older => 0
so that all files are read.
You can file more on https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html