exporting Hive table to csv in hdfs

2019-04-22 01:03发布

问题:

I know there is a known issue with delimiters when saving a table to csv (or other text file) in Hive. So I'm wondering if you guys can help me get around that.

I have an existing table (Table A) and I would like to save it in csv format to hdfs. From reading other responses I believe I would have to first create an external table (but I'm not sure how the whole thing would look).

Can anyone help?

回答1:

Try this in hive shell:

INSERT OVERWRITE LOCAL DIRECTORY '/path/to/hive/csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM hivetablename;

Change your /path/to/csv to the location where you want to store csv file. hivetablename to your hive table to be stored in csv format.



回答2:

This three step process worked fine for me:

  1. in HIVE, create a new temporary table stored as textfile

    CREATE TABLE temp_table( id INT, name STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS TEXTFILE;
    
  2. also in HIVE, write the source data into the temporary table

    INSERT OVERWRITE TABLE temp_table
    SELECT id, name FROM source_table;
    
  3. From the command line, copy the Hive table from its location in HDFS to you local file - inspection should reveal the file as csv (rename accordingly if desired)

    hdfs dfs -copyToLocal /apps/hive/warehouse/temp_table/* /tmp/local_dir/
    

If you run the command SHOW CREATE TABLE temp_table in HIVE, it will tell you the exact location of the table in HDFS, e.g.

| LOCATION                                                        |
|   'hdfs://hadoop_cluster/apps/hive/warehouse/temp_table'        |


回答3:

For external table in hive, you can follow the below steps:

  1. Create external table in hive

    CREATE EXTERNAL TABLE external_table( number INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hive/external/mytable/';

2. Load the data file from local to HDFS location

hadoop fs -put /home/user1/Desktop/filename.csv /user/hive/external/mytable/

The above two steps can solve your problem.



标签: hadoop hive