I need to store the messages pushed to Kafka in a deep storage. We are using Azure cloud services so I suppose Azure Blob storage could be a better option. I want to use Kafka Connect's sink connector API to push data to Azure Blob. Kafka documentation mostly suggests HDFS to export data however, in that case I need a Linux VM running Hadoop that will be costly I guess. My question is Azure Blob storage an appropriate choice to store JSON objects and building a custom sink connector is a reasonable solution for this case?
问题:
回答1:
A custom sink connector definitely works. Kafka Connect was absolutely designed so you could plugin connectors. In fact, connector development is entirely federated. Confluent's JDBC and HDFS connectors were implemented first simply due to the popularity of those two use cases, but there are many more (we keep a list of connectors we're aware of here.
In terms of whether Azure blob storage is appropriate, you mention JSON objects. I think the only thing you'll want to consider is the size of the objects and whether Azure storage will handle the size & number of objects well. I am not sure about Azure storage's characteristics, but in many other object storage systems you might need to aggregate many objects into a single blob to get good performance for a large number of objects (i.e. you might need a file format that supports many JSON objects).