What database would you use for logging (i.e. als

2019-03-28 08:24发布

After analyzing some gigabytes of logfiles with grep and the like I was wondering how to make this easier by using a database to log the stuff into. What database would be appropiate for this purpuse? A vanillia SQL database works, of course, but provides lots of transactional guarantees etc. which you don't need here, and which might make it slow if you work with gigabytes of data and very fast insertion rates. So a NoSQL database that could be the right answer (compare this answer for some suggestions). Some requirements for the database would be:

  • Ability to cope with gigabytes or maybe even terabytes of data
  • Fast insertion
  • Multiple indizes on each entry should be possible (e.g. time, session id, URL etc.)
  • If possible, it store the data in a compressed form, since logfiles are usually extremely repetitive.

Update: There are already some SO-questions for this: Database suggestion for processing/reporting on large amount of log file type data and What are good NoSQL and non-relational database solutions for audit/logging database . However, I am curious which databases fulfill which requirements.

3条回答
小情绪 Triste *
2楼-- · 2019-03-28 08:27

After having tried a lot of nosql solutions, my best bets would be:

  • riak + riak search for great scalability
  • unnormalized data in mysql/postgresql
  • mongoDB if you don't mind waiting
  • couchdb if you KNOW what you're searching for

Riak + Riak Search scale easily (REALLY!) and allow you free form queries over your data. You can also easily mix data schemas and maybe even compress data with innostore as a backend.

MongoDB is annoying to scale over several gigabytes of data if you really want to use indexes and not slow down to a crawl. It is really fast considering single node performance and offers index creation. As soon as your working data set doesn't fit in memory anymore, it becomes a problem...

mysql/postgresql is still pretty fast and allows free form queries thanks to the usual b+tree indexes. Look at postgres for partial indexes if some of the fields don't show up in every record. They also offer compressed tables and since the schema is fixed, you don't save your row names over and over again (that's what usually happens for a lot of the nosql solutions)

CouchDB is nice if you already know the queries you want to see, their incremental map/reduce based views are a great system for that.

查看更多
来,给爷笑一个
3楼-- · 2019-03-28 08:50

There are a lot of different options that you could look into. You could use Hive for your analytics and Flume to consume and load the log files. MongoDB might also be a good option for you, take a look at this article on log analytics with MongoDB, Ruby, and Google Charts

查看更多
smile是对你的礼貌
4楼-- · 2019-03-28 08:53

Depending on your needs Splunk might be a good option. It is more than just a database but you get all kinds of reporting. Plus it is designed to be a log file replacement so they have already solved the scaling issues.

查看更多
登录 后发表回答