I've got an application for which I only need the bandwidth of 1 Kinesis shard, but I need many lambda function invocations in parallel to keep up with the record processing. My record size is on the high end (some of them encroach on the 1000 KB limit), but the incoming rate is only 1 MB/s, as I'm using a single EC2 instance to populate the stream. Since each record contains an internal timestamp, I don't care about processing them in order. Basically I have several months' worth of data that I need to migrate, and I want to do it in parallel.
The processed records provide records for a database cluster that can handle 1000 concurrent clients, so my previous solution was to split my Kinesis stream into 50 shards. However, this has proved expensive, since all I need the shards for is to parallelize the processing. I'm using less than 1% of the bandwidth, and I had to increase the retention period.
Long term, I imagine the answer involves splitting my records up, so that the consumption time isn't such a huge multiple of the production time. That's not an option right now, but I realize I'm abusing the system slightly.
Is there a way I can have one order-preserving lambda function associated with a single-shard Kinesis stream, and let it invoke another lambda function asynchronously on a batch of records? Then I could use a single Kinesis shard (or other data source) and still enjoy massively parallel processing.
Really all I need is an option in the Lambda Event Source configuration for Kinesis to say "I don't care about preserving order of these records." But then I suppose keeping up with the iterator position on failed executions becomes more of a challenge.
According to somebody that works in AWS, it is possible to attach several Lambda functions to the same Kinesis stream. That said, I'm testing it with no success for now.
EDIT:
It's working properly.