问题:

Hi im writing a web crawler in python to extract news articles from news websites like nytimes.com. i want to know what would be a good db to use as a backend for this project?

Thanks in advance!

回答1:

This could be a great project to use a document database like CouchDB, MongoDB, or SimpleDB.

MongoDB has a hosted solution: http://mongohq.com. There is also a binding for Python (Pymongo).

SimpleDB is a great choice if you are hosting this on Amazon Web Services

CouchDB is an open source package from the Apache Foundation.

回答2:

Personally, I love PostGreSQL -- but other free DBs such as MySql (or, if you have reasonably small amounts of data -- a few GB at most -- even the SQLite that comes with Python) will be fine too.

回答3:

I think the database itself will probably be one of the easier aspects of a web crawler like this.

If expect high load reading or writing the database (for example if you intend to run many crawlers at the same time) then you will want to steer in the direction of MySql, otherwise something like Sqlite will probably do you just fine.