import old data from postgres to elasticsearch

2019-08-18 07:58发布

问题:

I have a lot of data in my postgres database( on a remote). This is the data of the past 1 year, and I want to push it to elasticsearch now.

The data has a time field in it in this format 2016-09-07 19:26:36.817039+00.

I want this to be the timefield(@timestamp) in elasticsearch. So that I can view it in kibana, and see some visualizations over the last year.

I need help on how do I push all this data efficiently. I cannot get that how do I get all this data from postgres.

I know we can inject data via jdbc plugin, but I think I cannot create my @timestamp field with that.

I also know about zombodb but not sure if that also gives me feature to give my own timefield.

Also, the data is in bulk, so I am looking for an efficient solution

I need help on how I can do this. So, suggestions are welcome.

回答1:

I know we can inject data via jdbc plugin, but I think I cannot create my @timestamp field with that.

This should be doable with Logstash. The first starting point should probably be this blog post. And remember that Logstash always consists of 3 parts:

  1. Input: JDBC input. If you only need to import once, skip the schedule otherwise set the right timing in cron syntax.
  2. Filter: This one is not part of the blog post. You will need to use the Date filter to set the right @timestamp value — adding an example at the end.
  3. Output: This is simply the Elasticsearch output.

This will depend on the format and field name of the timestamp value in PostgreSQL, but the filter part should look something like this:

date {
   match => ["your_date_field", "dd-mm-YYYY HH:mm:ss"]
   remove_field => "your_date_field" # Remove now redundant field, since we're storing it in @timestamp (the default target of date)
}

If you're concerned with the performance:

  • You will need to set the right jdbc_fetch_size.
  • Elasticsearch output is batched by default.