I am very new in big data space.
We got suggestion from team we should use hbase instead of RDBMS for high performance . We do not have any idea what should/must be considered before switching RDMS to hbase. Any ideas?
I am very new in big data space.
We got suggestion from team we should use hbase instead of RDBMS for high performance . We do not have any idea what should/must be considered before switching RDMS to hbase. Any ideas?
One of my favourite book describes..
Coming to @Whitefret's last point : There is some thing called CAP theorm based on which decision can be taken.
Consistency (all nodes see the same data at the same time)
Availability (every request receives a response about whether it succeeded or failed)
Partition tolerance (the system continues to operate despite arbitrary partitioning due to network failures)
However, for switching RDBMS to HBASE you can use SQOOP.
It's a difficult question, there are many things to consider.
If you can answer those questions and you think NoSQL is the drill, ask your team how they feel about it. NoSQL database comes with problem you would never meet in the SQL world. They should build a prototype first to understand how all this works, and maybe make some training available for them.
In Summary:
- Find if you need non relational database
- Choose the right one (is Hbase really what you need?, why not consider Cassandra or MongoDB?)
HBase like all NoSQL DB come with great new features but sadly nothing is free (not even mentionning the money cost).
In HBase, you really should check if all the query that you might want to do can be fullfilled with the HBase data model. An important thing to consider is the schema design (the modelisation of the rowkey most and foremost). I advice you to read this really good paper :
http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf
I think that a really good answer to your question can be found on the HBase official site.
"HBase isn’t suitable for every problem.
First, make sure you have enough data. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.
Second, make sure you can live without all the extra features that an RDBMS provides (e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.) An application built against an RDBMS cannot be "ported" to HBase by simply changing a JDBC driver, for example. Consider moving from an RDBMS to HBase as a complete redesign as opposed to a port.
Third, make sure you have enough hardware. Even HDFS doesn’t do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only. "
https://hbase.apache.org/book.html