AWS Data Pipelines with a Heroku Database

2019-08-27 06:42发布

I'm wondering about the feasibility of connecting an AWS Data Pipeline to a Heroku Database. The heroku databases are stored on EC2 instances (east region), and require SSL.

I've tried to open up a connection using a JdbcDatabase Object, but have run into issues at every turn.

I've tried the following:

{
      "id" : "heroku_database",
      "name" : "heroku_database",
      "type" : "JdbcDatabase",
      "jdbcDriverClass" : "org.postgresql.Driver",
      "connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
      "jdbcProperties": "ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory",
      "username" : "#{myHerokuDatabaseUserName}",
      "*password" : "#{*myHerokuDatabasePassword}"
   },

with the result of:

unable to find valid certification path to requested target
ActivityFailed:SunCertPathBuilderException

as well as:

{
      "id" : "heroku_database",
      "name" : "heroku_database",
      "type" : "JdbcDatabase",
      "jdbcDriverClass" : "org.postgresql.Driver",
      "connectionString" : "jdbc:postgresql://#{myHerokuDatabaseHost}:#{myHerokuDatabasePort}/#{myHerokuDatabaseName}",
      "jdbcProperties": "sslmode=require",
      "username" : "#{myHerokuDatabaseUserName}",
      "*password" : "#{*myHerokuDatabasePassword}"
   },

with the result of:

amazonaws.datapipeline.database.ConnectionFactory: Unable to establish connection to jdbc:postgresql://ec2-54-235-something-something.compute-1.amazonaws.com:5442/redacted FATAL: no pg_hba.conf entry for host "52.13.105.196", user "redacted", database "redacted", SSL off

To boot -- I have also tried to use a ShellCommandActivity to copy the postgres table from the ec2 instance and stdout it to my s3 bucket -- however the ec2 instance doesn't understand the psql command:

{
      "id": "herokuDatabaseDump",
      "name": "herokuDatabaseDump",
      "type": "ShellCommandActivity",
      "runsOn": { 
        "ref": "Ec2Instance" 
      },
      "stage": "true",
      "stdout": "#{myOutputS3Loc}/#{myOutputFileName}",
      "command": "PGPASSWORD=#{*myHerokuDatabasePassword} psql -h #{myHerokuDatabaseHost} -U #{myHerokuDatabaseUserName} -d #{myHerokuDatabaseName} -p #{myHerokuDatabasePort} -t -A -F',' -c 'select * #{myHerokuDatabaseTableName}'"
    },

and I also cannot yum install postgres beforehand.

It sucks to have both RDS and Heroku as our database sources. Any ideas on how to get a select query to run against a heroku postgres db via a data pipeline would be a great help. Thanks.

1条回答
SAY GOODBYE
2楼-- · 2019-08-27 06:57

It looks like Heroku needs/wants the postgres 42.2.1 driver: https://devcenter.heroku.com/articles/heroku-postgresql#connecting-in-java. Or at least if you are compiling a java app that's what they tell you to use.

I wasn't able to find out which driver Data Pipeline uses by default but it allows you to use the jdbcDriverJarUri and specify custom driver jars: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html

An important note here is that it requires Java7, so you are going to want to use the postgres-42.2.1.jre7.jar: https://jdbc.postgresql.org/download.html

That combined with a jdbcProperties field of sslmode=require should allow it to go through and create the dump file you are looking for.

查看更多
登录 后发表回答