What database works well with 200+GB of data?

2019-06-01 09:43发布

问题:

I've been using mysql (with innodb; on Amazon rds) because it's sort of universal default, but it's been ridiculously under-performing, and tweaking it only delays the inevitable.

The data is mostly relatively short (<1kB of bytes each) blobs information about 100Ms of urls. There is (or should be, mysql cannot seem to handle it) very high amount of insert / update / retrieve but few complex queries - not that complex queries wouldn't be useful, but because mysql is so slow that it's far faster to get the data out, process it locally, and cache the results somewhere.

I can keep tweaking mysql and throwing more hardware at it, but it seems increasingly futile.

So what are the options? SQL/relational model/etc. optional - anything will do as long as it's fast, networked, and language-independent.

回答1:

Have you done any sort of end-to-end profiling of your application and MySQL database? To provide better advice it would also be good to understand what improvements you have tried to implement, and your database structure. You haven't given a lot of information on how your MySQL database is configured either. It provides a lot of options for tuning.

You should pick up a copy of High Performance MySQL if you haven't already to learn more about the product.

There is no point in doing anything until you know what your problem is. NoSQL solutions can offer performance benefits but you have provided little evidence that MySQL is incapable of servicing your needs.



回答2:

Well "Fast, networked and language-independent" + "few complex queries" brings to mind the various NoSQL solutions. To name a few:

  • MongoDB
  • CouchDB
  • Cassandra

And if that's not fast enough, there are always the wicked fast Redis which is my personal favorite atm. :) It is not a database per se, but it's good enough for most scenarios.

I am sure other people can list more NoSQL databases...
and there is always http://nosql-database.org/ .

Generally speaking, databases in this category is better and faster in your scenario because they have relaxed constraints and thus is easier and faster to insert/update/retrieve frequently. But that requires that you think harder about your data model and it is generally not possible to do SQL-style complex queries directly -- you'll instead write more pre-computed data or use a more denormalized design to account for the lack of complex queries.

But since complex queries is a minor problem in your case, I think NoSQL solutions are ideal for you.



回答3:

With the data you've given about your application's data and workload, it is almost impossible to determine whether the problem really is MySQL itself or something else. You seem to assume that you can throw any workload to a relational engine and it should handle it. Therefore the suggestions made by other commenters about analyzing the performance more carefully are valid in my opinion. Without more data (transactions / second etc.) any further analysis regarding other suitable engines is also futile.



回答4:

I'm not sure I agree with the advice to jump ship on traditional databases. It might not be the most efficient tool, but it is the one that is FAR more widely understood and used, and a strongly doubt you have a problem that can't be handled by an efficiently set up relational database. Obvious answers are Oracle, SQLServer, etc, but it might just be your database structure isn't right. I don't know much about MySQL but I do know it's used in some pretty big projects (eBay being noteworthy).