I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL.
I've tried to open up a connection using a JdbcDatabase
Object, but have run into issues at every turn.
I've tried the following:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
unable to find valid certification path to requested target
ActivityFailed:SunCertPathBuilderException
as well as:
{
"id" : "heroku_database",
"name" : "heroku_database",
"type" : "JdbcDatabase",
"jdbcDriverClass" : "org.postgresql.Driver",
"connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
"jdbcProperties": "sslmode=require",
"username" : "#{myHerokuDatabaseUserName}",
"*password" : "#{*myHerokuDatabasePassword}"
},
with the result of:
amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc:postgresql://ec2-54-235-something-something.compute-1.amazonaws.com:5442/redacted FATAL: no pg_hba.conf entry for host "52.13.105.196", user "redacted", database "redacted", SSL off
To boot -- I have also tried to use a ShellCommandActivity
to copy the postgres table from the ec2 instance and stdout it to my s3 bucket -- however the ec2 instance doesn't understand the psql command:
{
"id": "herokuDatabaseDump",
"name": "herokuDatabaseDump",
"type": "ShellCommandActivity",
"runsOn": {
"ref": "Ec2Instance"
},
"stage": "true",
"stdout": "#{myOutputS3Loc}/#{myOutputFileName}",
"command": "PGPASSWORD=#{*myHerokuDatabasePassword} psql -h #{myHerokuDatabaseHost} -U #{myHerokuDatabaseUserName} -d #{myHerokuDatabaseName} -p #{myHerokuDatabasePort} -t -A -F',' -c 'select * #{myHerokuDatabaseTableName}'"
},
and I also cannot yum install postgres beforehand.
It sucks to have both RDS and Heroku as our database sources. Any ideas on how to get a select query to run against a heroku postgres db via a data pipeline would be a great help. Thanks.
It looks like Heroku needs/wants the postgres 42.2.1 driver: https://devcenter.heroku.com/articles/heroku-postgresql#connecting-in-java. Or at least if you are compiling a java app that's what they tell you to use.
I wasn't able to find out which driver Data Pipeline uses by default but it allows you to use the
jdbcDriverJarUri
and specify custom driver jars: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.htmlAn important note here is that it requires Java7, so you are going to want to use the postgres-42.2.1.jre7.jar: https://jdbc.postgresql.org/download.html
That combined with a
jdbcProperties
field ofsslmode=require
should allow it to go through and create the dump file you are looking for.