I am learning to use Hadoop for performing Big Data related operations.
I need to perform some queries on a collection of data sets split across 8 xls files. Each xls file has multiple sheets and the query concerns only one of the sheets.
The dataset can be downloaded here : http://www.census.gov/hhes/www/hlthins/data/utilization/tables.html
I am not using any commerical distro of hadoop for my tasks, just have one master and a slave VM set up in VmWare with Hadoop, Hive, Pig in them.
I am a novice with Hadoop and Big Data, so if anyone could guide me with how to proceed further I'd be very grateful.
If you need information on the queries or anything else let me know.
Thanks.