How does fluentd benefit this scenario?

2019-08-03 05:58发布

问题:

I've come across Fluentd. Why would you use such a thing when its easy enough to store raw data on a db directly? I might be misunderstanding the use of the technology here. Glad to hear some feedback.

Why would anyone want to go through another layer, when its easy enough to capture and store raw data in your own a data store?

Consider this scenario. I want to store page views. Raw data is stored in an RDBMS and formatted data is stored in Mongodb This is a short description of my current setup:

When a user visits my site. My application (Rails) resolves the IP to a match a country. After the IP is resolved, I store the raw data into an RDBMS. I have a worker/cron running to process all the raw data into a Mongo document every hour. Why would I need fluentd there? What are the benefits of having a logging framework in this instance?

回答1:

  1. You don't need to make/maintain your own worker to move stuff between your first RDBMS and Mongo.
  2. You get very easy parallelization and redundancy of the process that moves data into Mongo. You could build this into your worker/cron job, but why would you want to reinvent the wheel?
  3. You asked why anyone would want another layer. Your worker/cron job is another layer, but way less tested than Fluentd.
  4. You get a bunch of free plugins, so if you want to start adding your data to additional places aside from Mongo (i.e. Storm, S3, HDFS, etc...) you can do that really easily by editing a config file instead of writing a bunch of code yourself.
  5. You have a bunch of free built-in options like how frequently to flush your data/ at what size to flush it.
  6. Most importantly: you offload the entirety of this logging/ data input workflow off of your app boxes, so if anything goes wrong with your data insert process on your app boxes, the problem will appear and be handled on your Fluent log aggregator boxes and NOT on your app boxes.