Connect Azure Event Hubs with Data Lake Store

2019-05-07 13:28发布

问题:

What is the best way to send data from Event Hubs to Data Lake Store?

回答1:

I am assuming you want to ingest data from EventHubs to Data Lake Store on a regular basis. Like Nava said, you can use Azure Stream Analytics to get data from EventHub into Azure Storage Blobs. Thereafter you can use Azure Data Factory (ADF) to copy data on a scheduled basis from Blobs to Azure Data Lake Store. More details on using ADF are available here: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-datalake-connector/. Hope this helps.

== March 17, 2016 update.

Support for Azure Data Lake Store as an output for Azure Stream Analytics is now available. https://blogs.msdn.microsoft.com/streamanalytics/2016/03/14/integration-with-azure-data-lake-store/ . This will be the best option for your scenario.

Sachin Sheth

Program Manager, Azure Data Lake



回答2:

In addition to Nava's reply: you can query data in a Windows Azure Blob Storage container with ADLA/U-SQL as well. Or you can use the Blob Store to ADL Storage copy service (see https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/).



回答3:

One way would be to write a process to read messages from the event hub event hub API and writes them into a Data Lake Store. Data Lake SDK.

Another alternative would be to use Steam Analytics to get data from Event Hub into a Blob, and Azure Automation to run a powershell that would read the data from the blob and write into a data lake store.



回答4:

Not taking credit for this, but sharing with the community:

It is also possible to archive the Events (look into properties\archive), this leaves an Avro blob.

Then using the AvroExtractor you can convert the records into Json as described in Anthony's blob: http://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/



回答5:

One of the ways would be to connect your EventHub to Data Lake using EventHub capture functionality (Data Lake and Blob Storage is currently supported). Event Hub would write to Data Lake every N mins interval or once data size threshold is reached. It is used to optimize storage "write" operations as they are expensive on a high scale.

The data is stored in Avro format, so if you want to query it using USQL you'd have to use an Extractor class. Uri gave a good reference to it https://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/.