As this is coming from a newbie...
I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server?
How do I load .csv data into Hive then? What does it have to do with Hadoop and which mode I should run that one?
What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier?
Let me work you through the following simple steps:
Steps:
First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.
Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.
Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded
Thanks.
You may try this, Following are few examples on how files are generated. Tool -- https://sourceforge.net/projects/csvtohive/?source=directory
Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/
Tool Generates Hadoop script with all csv files and following is a sample of generated Hadoop script to insert csv into Hadoop
Sample of generated Hive scripts
Thanks Vijay
if you have a hive setup you can put the local dataset directly using Hive load command in hdfs/s3.
You will need to use "Local" keyword when writing your load command.
Syntax for hiveload command
Refer below link for more detailed information. https://cwiki.apache.org/confluence/display/Hive/LanguageManual%20DML#LanguageManualDML-Loadingfilesintotables
For csv file formate data will be in below format
And if we will use field terminated by ',' then each column will get values like below.
also if any of the column value has
comma
as value then it will not work at all .So the correct way to create a table would be by using OpenCSVSerde
You can load local CSV file to Hive only if:
hive
orbeeline
for upload.There is another way of enabling this,
use hadoop hdfs -copyFromLocal to copy the .csv data file from your local computer to somewhere in HDFS, say '/path/filename'
enter Hive console, run the following script to load from the file to make it as a Hive table. Note that '\054' is the ascii code of 'comma' in octal number, representing fields delimiter.