I am very new to hadoop and hbase and have some conceptual questions that are tripping me up during every tutorial I've found.
I have hadoop and hbase running on a single node within a ubuntu VM on my win 7 system. I have a csv file that I would like to load into a single hbase table.
The columns are: loan_number, borrower_name, current_distribution_date, loan_amount
I know that I need to write a MapReduce job to load this said csv file into hbase. The following tutorial describes the Java needed to write this MapReduce job. http://salsahpc.indiana.edu/ScienceCloud/hbase_hands_on_1.htm
What I'm missing is:
Where do I save these files and where do I compile them? Should I compile this on my win 7 machine running visual studio 12 and then move it to the ubuntu vm?
I read this SO question and answers but I guess I'm still missing the basics: Loading CSV File into Hbase table using MapReduce
I can't find anything covering these basic hadoop/hbase logistics. Any help would be greatly appreciated.
You can save the Map Reduce classes in anywhere (Either in Win 7 or Ubuntu VM). You can compile it anywhere too. Just make a Jar file with the classes you created and you that jar to run the map reduce in your VM.
Then in your Ubuntu VM after starting Hadoop you can use the following command to run the map reduce class you created.
When you run the above command the the Map Reduce class you wrote will be executed along with the Hbase table will be populated.
Hope this helps
There is no need to code a MapReduce job to bulk load data into HBase. There are several ways to bulk load data into HBase:
1) Use HBase tools like
importtsv
andcompletebulkload
http://hbase.apache.org/book/arch.bulk.load.html2) Use Pig to bulk load data. Example:
3) Do it programatically using the HBase API. I got a small project called hbaseloader that loads files into a HBase table (table it has just one ColumnFamily with the content of the file). Take a look at it, you just need to define the structure of your table and modified the code to read a csv file and parse it.
4) Do it programatically using a MapReduce job like in the example you mentioned.