I'm using AWS data pipeline service to pipe data from a RDS MySql
database to s3
and then on to Redshift
, which works nicely.
However, I also have data living in an RDS Postres
instance which I would like to pipe the same way but I'm having a hard time setting up the jdbc-connection. If this is unsupported, is there a work-around?
"connectionString": "jdbc:postgresql://THE_RDS_INSTANCE:5432/THE_DB”
this doesn't work yet. aws hasnt built / released the functionality to connect nicely to postgres. you can do it in a shellcommandactivity though. you can write a little ruby or python code to do it and drop that in a script on s3 using scriptUri. you could also just write a psql command to dump the table to a csv and then pipe that to OUTPUT1_STAGING_DIR with "staging: true" in that activity node.
something like this:
i didn't run this to verify because it's a pain to spin up a pipeline :( so double check the escaping in the command.
look into the new stuff aws just launched on parameterized templating data pipelines: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-templates.html. it looks like it will allow encryption of arbitrary parameters.
Nowadays you can define a copy-activity to extract data from a Postgres RDS instance into S3. In the Data Pipeline interface:
AWS now allow partners to do near real time RDS -> Redshift inserts.
https://aws.amazon.com/blogs/aws/fast-easy-free-sync-rds-to-redshift/