I am using mongodb to store raw HTML data of web pages using scrapy framework. In one day of web scraping 25GB disk space is filled up. Is there a way to store raw data in compressed format.
相关问题
- MongoDB can not create unique sparse index (duplic
- Spring Data MongoDB - lazy access to some fields
- iOS (objective-c) compression_decode_buffer() retu
- Golang mongodb aggregation
- How to convert from Timestamp to Mongo ObjectID
相关文章
- mongodb有没有什么办法禁止读取数据的时候进行缓存
- mongodb-aggregate聚合查询分组后如何获得多字段
- mongodb error: how do I make sure that your journa
- How to track MongoDB requests from a console appli
- MongoError: cannot infer query fields to set, path
- Pymongo $in Query Not Working
- django.core.exceptions.ImproperlyConfigured: '
- c# saving very large bitmaps as jpegs (or any othe
Starting with 2.8 version of Mongo, you can use compression. You will have 3 levels of compression with WiredTiger engine, mmap (which is default in 2.6 does not provide compression):
Here is an example of how much space will you be able to save for 16 GB of data:
data is taken from this article.
There's nothing built in for compression. Some operating systems offer disk/file compression, but if you want more control, I'd suggest you compress it using a library for whatever programming language you're using and manually control the compression.
For example, NodeJs offers simple convenience methods for this: http://nodejs.org/api/zlib.html#zlib_examples
3.0 Update
If you choose to switch to the new storage engine WiredTiger which ships with 3.0, you can choose between several types of compression as documented here. Of course, you'll want to test this change in production workloads to find if the additional CPU utilization is worth the compression received.
You can store your string like this to compress it: myhtml.encode('zlib')