DynamoDB InputFormat for Hadoop

2019-04-12 07:49发布

I have to process some data which is persisted in Amazon Dynamo DB using Hadoop map reduce.

I was searching over internet for Hadoop InputFormat for Dynamo DB and couldn't find it. I'm not familiar with Dynamo DB so I'm guessing there is some trick related to DynamoDB and Hadoop? If there is anywhere implementation of this Input Format could you please share it?

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-04-12 08:37

After a lot of searching I found DynamoDBInputFormat and DynamoDBOutputFormat in one of Amazon's libraries.

On amazon elastic map reduce there is library called hive-bigbird-handler which contains input and output format for dynamoDB. Full class names are: org.apache.hadoop.hive.dynamodb.write.DynamoDBOutputFormat and org.apache.hadoop.hive.dynamodb.read.DynamoDBInputFormat

I hope these classes will be useful to community.

查看更多
别忘想泡老子
3楼-- · 2019-04-12 08:40

Couldn't find an InputFormat which you could use directly in MapReduce. But, here is an article AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB (Guest Post) to run MarReduce jobs using Hive.

查看更多
登录 后发表回答