Can I customize partitioning in Kinesis Firehose b

2020-03-20 14:34发布

I have a Firehose stream that is intended to ingest millions of events from different sources and of different event-types. The stream should deliver all data to one S3 bucket as a store of raw\unaltered data.

I was thinking of partitioning this data in S3 based on metadata embedded within the event message like event-souce, event-type and event-date.

However, Firehose follows its default partitioning based on record arrival time. Is it possible to customize this partitioning behavior to fit my needs?

标签： amazon-s3 amazon-kinesis-firehose

1条回答

够拽才男人

2楼-- · 2020-03-20 14:57

No. You cannot 'partition' based upon event content.

Some options are:

Send to separate Firehose streams
Send to a Kinesis Data Stream (instead of Firehose) and write your own custom Lambda function to process and save the data (See: AWS Developer Forums: Athena and Kinesis Firehose)
Use Kinesis Analytics to process the message and 'direct' it to different Firehose streams

If you are going to use the output with Amazon Athena or Amazon EMR, you could also consider converting it into Parquet format, which has much better performance. This would require post-processing of the data in S3 as a batch rather than converting the data as it arrives in a stream.

0人赞添加讨论(0) 举报

Can I customize partitioning in Kinesis Firehose b

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间