Athena can't resolve CSV files from AWS DMS

2019-06-01 16:50发布

I've DMS configured to continuously replicate data from MySQL RDS to S3. This creates two type of CSV files: a full load and change data capture (CDC). According to my tests, I have the following files:

testdb/addresses/LOAD001.csv.gz
testdb/addresses/20180405_205807186_csv.gz

After DMS is running properly, I trigger a AWS Glue Crawler to build the Data Catalog for the S3 Bucket that contains the MySQL Replication files, so the Athena users will be able to build queries in our S3 based Data Lake.

Unfortunately the crawlers are not building the correct table schema for the tables stored in S3. For the example above It creates two tables for Athena:

addresses
20180405_205807186_csv_gz

The file 20180405_205807186_csv.gz contains a one line update, but the crawler is not capable of merging the two informations (taking the first load from LOAD001.csv.gz and making the updpate described in 20180405_205807186_csv.gz).

I also tried to create the table in the Athena console, as described in this blog post:https://aws.amazon.com/pt/blogs/database/using-aws-database-migration-service-and-amazon-athena-to-replicate-and-run-ad-hoc-queries-on-a-sql-server-database/. But it does not yield the desired output.

From the blog post:

When you query data using Amazon Athena (later in this post), you simply point the folder location to Athena, and the query results include existing and new data inserts by combining data from both files.

Am I missing something?

标签： amazon-athena aws-dms

0条回答

Athena can't resolve CSV files from AWS DMS

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间