I am working on a IoT solution that will save weather data.
I have googled for some days now on how to set up the backend.
I am going to use Azure IoT Hub for handling communication, but the next step is the problem.
I want to store the telemetry to a database. This is where I get confused.
Some examples says that I should use Azure BLOB storage or Azure Table storage or Azure SQL.
After some years of data collection I want to start creating reports of the data. So the storage needs to be good at working with big data.
Next problem I am stuck on is the worker that will receive the D2C and store it to database. All Azure IoT examples use a console application and some use Azure Stream analytics just to port the event to a database. What is the best practice? It needs to be able to scale and try to use best practice.
Thanks in advance!
You shall be reading
https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-endpoints
You shall also look at `Time Series Insights'
https://azure.microsoft.com/en-us/services/time-series-insights/
Here is my rough sketch. Time series requires token to be generated from Active Directory, but it's easy to setup.
As shown above,
- Device will be sending data to IoT Hub [You can even use Device Provisioning Service here]
IOTHub support multiple endpoints. so one end can go to time series insights and one go to any database like CosmoDB.
Note: Time series can store only 400 days of data, later it deletes it.
Regarding reports - Time series has extensive components for report and it's very fast. Also, u can access it using programming language like c#.
Important Note: Before designing any cloud architect, please work on "Sizing" factor of the data. Like, frequency of data and size. Based on that we can select the resources in azure cloud.
Please read this
http://download.microsoft.com/download/A/4/D/A4DAD253-BC21-41D3-B9D9-87D2AE6F0719/Microsoft_Azure_IoT_Reference_Architecture.pdf
If you choose IoT Hub to handle communication you have a few options on how to handle the data ( make sure IoT hub is the right choice for you, if you don't need bi directional communication maybe Azure Event Hub will be a better choice, it's a lot cheaper when dealing with big data).
- Stream analytics - Will let you output the incoming data to SQL Database, BLOB, Event Hub, Table Storage, Service Bus Queue & Topic, Document DB, Power Bi and DataLake store. In this option you won't have to manage your own worker to handle data.
- EventProcessorHost - Here you will have to write your own implementation of getting data and storing it. This option will give you the flexibility to store the data in every storage you want but you will have to manage the hosting of the EPH. Azure Worker Rule is a good choice for hosting and scaling.
- Storm (HD Insights) - You can use Apache storm to read data from IoT Hub, it will also give you real time computation options that is much wider from the one that Stream Analytics provides. After reading the data with Storm you also have the option to store it in every storage you want. be aware that storm on Azure is very expensive and may be an overkill for your application.
As for reporting - it really depends on your needs, I would avoid blobs/table storage for any complex reporting, those 2 are more optimized for storing a lot of data and less for making complex queries.
If you want to make you own reporting/queries you can choose Sql/DocumentDb. but make sure that if you choose NoSql solution that you will benefit from the schema less architecture.
For a Paas solution you can choose Power BI -
https://powerbi.microsoft.com/en-us/blog/outputting-real-time-stream-analytics-data-to-a-power-bi-dashboard/
Disclaimer - I've answered your question on the assumption you want to use the Azure stack.
Good luck
@KristerJohansson, According to your description, based on my understanding, it's an IoT solution that a data collector receive the weather data from some devices with sensors and store these data for analyzing & reporting. I think there is some key bases which need to be considered as variables that determine the data volume, such as weather data columns, data format, sampling frequency, the number of devices, etc.
So consideration for scalability & big data, per my experience, as reference, I think the best practise is that using IoTHub for handling communication and using Stream analytics to retrieve & store data from IoTHub to Blob Storage. After some years of data collection, you can use Azure Machine Learning to read these data from blob storage for analyzing and reporting.
Any concern, please feel free to let me know.
Azure added a new interesting feature for your problem.
It is now possible to route IoT messages directly to Azure Storage.
https://azure.microsoft.com/en-us/blog/route-iot-device-messages-to-azure-storage-with-azure-iot-hub/
I haven't tested it out yet but the article looks promising.