AWS Glue Crawler fails with 11 million files on S3

2019-07-05 00:34发布

站内文章 / 后端开发

62 0

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Got 11 million+ json files in S3.

Tried to crawl and catalog them to AWS Glue.

JSON File Details:

Each file size is from 250KB to 2MB uncompressed.

Logs:

BENCHMARK : Running Start Crawl for Crawler impall
ERROR : Internal Service Exception
BENCHMARK : Crawler has finished running and is in state READY

Am I missing any step in processing those huge files?

标签： amazon-web-services aws-glue

一夜七次

女 | 书童

私信

Ta的文章更多文章

0条评论

还没有人评论过~