AWS Glue Crawler fails with 11 million files on S3

2019-07-05 00:34发布

问题:

Got 11 million+ json files in S3.

Tried to crawl and catalog them to AWS Glue.

JSON File Details:

Each file size is from 250KB to 2MB uncompressed.

Logs:

BENCHMARK : Running Start Crawl for Crawler impall
ERROR : Internal Service Exception
BENCHMARK : Crawler has finished running and is in state READY

Am I missing any step in processing those huge files?