I am working on a IoT solution that will save weather data. I have googled for some days now on how to set up the backend. I am going to use Azure IoT Hub for handling communication, but the next step is the problem.
I want to store the telemetry to a database. This is where I get confused. Some examples says that I should use Azure BLOB storage or Azure Table storage or Azure SQL.
After some years of data collection I want to start creating reports of the data. So the storage needs to be good at working with big data.
Next problem I am stuck on is the worker that will receive the D2C and store it to database. All Azure IoT examples use a console application and some use Azure Stream analytics just to port the event to a database. What is the best practice? It needs to be able to scale and try to use best practice.
Thanks in advance!
@KristerJohansson, According to your description, based on my understanding, it's an IoT solution that a data collector receive the weather data from some devices with sensors and store these data for analyzing & reporting. I think there is some key bases which need to be considered as variables that determine the data volume, such as weather data columns, data format, sampling frequency, the number of devices, etc.
So consideration for scalability & big data, per my experience, as reference, I think the best practise is that using IoTHub for handling communication and using Stream analytics to retrieve & store data from IoTHub to Blob Storage. After some years of data collection, you can use Azure Machine Learning to read these data from blob storage for analyzing and reporting.
Any concern, please feel free to let me know.
If you choose IoT Hub to handle communication you have a few options on how to handle the data ( make sure IoT hub is the right choice for you, if you don't need bi directional communication maybe Azure Event Hub will be a better choice, it's a lot cheaper when dealing with big data).
As for reporting - it really depends on your needs, I would avoid blobs/table storage for any complex reporting, those 2 are more optimized for storing a lot of data and less for making complex queries.
If you want to make you own reporting/queries you can choose Sql/DocumentDb. but make sure that if you choose NoSql solution that you will benefit from the schema less architecture.
For a Paas solution you can choose Power BI - https://powerbi.microsoft.com/en-us/blog/outputting-real-time-stream-analytics-data-to-a-power-bi-dashboard/
Disclaimer - I've answered your question on the assumption you want to use the Azure stack.
Good luck
You shall be reading https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-endpoints
You shall also look at `Time Series Insights'
https://azure.microsoft.com/en-us/services/time-series-insights/
Here is my rough sketch. Time series requires token to be generated from Active Directory, but it's easy to setup.
As shown above,
IOTHub support multiple endpoints. so one end can go to time series insights and one go to any database like CosmoDB.
Note: Time series can store only 400 days of data, later it deletes it.
Regarding reports - Time series has extensive components for report and it's very fast. Also, u can access it using programming language like c#.
Important Note: Before designing any cloud architect, please work on "Sizing" factor of the data. Like, frequency of data and size. Based on that we can select the resources in azure cloud.
Please read this http://download.microsoft.com/download/A/4/D/A4DAD253-BC21-41D3-B9D9-87D2AE6F0719/Microsoft_Azure_IoT_Reference_Architecture.pdf
Azure added a new interesting feature for your problem.
It is now possible to route IoT messages directly to Azure Storage. https://azure.microsoft.com/en-us/blog/route-iot-device-messages-to-azure-storage-with-azure-iot-hub/
I haven't tested it out yet but the article looks promising.