I am using logstash to index data from postgres(jdbc input plugin) into elasticsearch. I don't have any time based information in the database. Postgres table users to import has 2 columns - userid(unique), uname Elastic search export - _id = userid I am exporting this data every hour using cron schedule in logstash.
input {
jdbc {
schedule => "0 */1 * * *"
statement => "SELECT userid, uname FROM users"
}
}
output {
elasticsearch {
hosts => ["elastic_search_host"]
index => "user_data"
document_id => "%{userid}"
}
}
This logstash config indexes data correctly. But, it works only for update and insert cases. If any data/user info is deleted from table, it will not delete the document from the elastic search index. Can someone please help me with the delete case?
There is no out of the box option available in logstash to achieve your intended outcome.
https://discuss.elastic.co/t/delete-elasticsearch-document-with-logstash-jdbc-input/47490 - as mentioned here, you can add a "status" column, and flag the entry as deleted, instead of deleting the entry.
Another way to do it would be to delete your index every hour and then let logstash do it’s thing. There will be a very brief duration when there will be no data in Elasticsearch.
To avoid that, instead you can configure logstash to index to a new index every hour for ex. user_data-timestamp and then delete older indices externally using curator etc