I'm using a service which outputs to an Event Hub.
We want to store that output, to be read once per day by a batch job running on Apache Spark. Basically we figured, just get all messages dumped to blobs.
What's the easiest way to capture messages from an Event Hub to Blob Storage?
Our first thought was a Streaming Analytics job, but it demands to parse the raw message (CSV/JSON/Avro), our current format is none of those.
Update We solved this problem by changing our message format. I'd still like to know if there's any low-impact way to store messages to blobs. Did EventHub have a solution for this before Streaming Analytics arrived?
Azure now has this built-in: Event Hubs Archive (in preview)
You can use event-hubs-capture to capture to a blob.
You could write your own worker process to read the messages off EventHub and store them to blob storage. You do not need to do this real time as messages on EH remain for the set retention days. The client that reads the EH is responsible for managing what messages have been processed by keeping track of the EH message partitionid and offset. There is a C# library that makes this extremely easy and scales really well: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/
You can also do this via an Azure Function (serverless code) which fires from an Event Hub trigger.
Depending on your requirement, this can work better than the Event Capture feature if you need a capability that it doesn't have, like saving as GZIP, or writing to a more custom blob virtual directory structure.
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs#trigger-usage