I am going to deal with a huge amount of data in my project. I have read about big data concepts but never used it yet. But reading all those Big Data Documents I am still not sure whether my requirement needs Big Data or is it good to handle with traditional relational database.
Here is some information about my DB.
My main DB is a repository for different data sources. Each of this data sources deals with same kind of data (data in same domain), but some data sources contain extra fields which in not available in others and some contain less. In other words some of the data fields in these data sources are same, but some are different. So my core DB should contain all those fields. Total fields in my core DB should be approximately 2000 fields and it may contain 10 to 20 million records.
The DB operation which is happening in my core DB will be data insertion and reading (searching). Since it deals with huge amount of data I was thinking to use big data concepts. But I am still not sure whether this suits for big data. Because some amount of my data has similar characteristics (same fields) and some contain extra information. And I need all the kind of searching fast in my DB.
Thanks.
Relational databases like MySQL can handle billions of rows / records so the decision will depend on your use case(s). For Big Data NoSQL systems, it is very important to understand how the strengths and limitations of each system map to your use case(s) as they can behave very differently.
Here are some MySQL examples:
- 1.1 billion rows on Percona DB (fork of MySQL)
- 0.95 billion rows on MySQL
In the second example, they moved from MySQL to Redis because they need to store the equivalent of 359 billion rows, far more than the 950 million they were storing in MySQL.
Given that you say you have fast searching requirements, it is important to understand what kind of searches you need as different databases have different searches they support. Additionally, some supported searches may have limited functionality. If you have search requirements that go beyond the core data store functionality, often times a full text solution will be added, for example, using Cassandra for the data store and Elasticsearch for the search component.
To provide some background for this decision, it's useful and important to consider your requirements with respect to the CAP Theorem which states that distributed computer systems can provide some but not all of the following guarantees (from Wikipedia):
- Consistency (all nodes see the same data at the same time)
- Availability (a guarantee that every request receives a response
about whether it succeeded or failed)
- Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
http://en.wikipedia.org/wiki/CAP_theorem
Graphically, you can see how different database solutions including MySQL and NoSQL solutions map out here:
If you provide more information on your use case(s), you can get more detailed responses.