蜂巢查询输出到文件(Hive query output to file)

2019-07-18 03:42发布

我通过运行Java代码蜂巢查询。 例:

“SELECT * FROM表WHERE ID> 100”

如何导出结果到HDFS文件。

Answer 1:

下面的查询将直接结果插入HDFS:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir' SELECT * FROM table WHERE id > 100;


Answer 2:

此命令将输出重定向到您选择的文本文件:

$hive -e "select * from table where id > 10" > ~/sample_output.txt


Answer 3:

这将使在制表符分隔的文件(S)的结果的目录下:

INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/YourTableDir'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
SELECT * FROM table WHERE id > 100;


Answer 4:

@sarath如何覆盖该文件,如果我想从一个不同的表运行另一个SELECT *命令,并写入到同一个文件?

INSERT OVERWRITE本地目录 '/家/培训/ MYDATA /输出' SELECT EXPL,计数(EXPL)的总
FROM(SELECT爆炸(分裂),为EXPL FROM(SELECT分裂(即,”“)作为将来自单词计数)t2)的t3的GROUP BY EXPL;

这是一个例子,萨拉特的问题

以上是存储在输出文件字数统计工作,这是在本地目录:)



Answer 5:

我同意tnguyen80的响应。 请注意,当有查询特定的字符串值,更好地给整个查询在双引号。

例如:

$hive -e "select * from table where city = 'London' and id >=100" > /home/user/outputdirectory/city details.csv


Answer 6:

它将使用理想的方式做 “INSERT OVERWRITE DIRECTORY '/ pathtofile' 选择临时*其中id> 100”,而不是 “蜂巢-e 'SELECT * FROM ...'> /filepath.txt”



Answer 7:

要直接保存在HDFS文件,使用下面的命令:

hive> insert overwrite  directory '/user/cloudera/Sample' row format delimited fields terminated by '\t' stored as textfile select * from table where id >100;

这将使内容在文件夹/用户/ Cloudera公司/ HDFS中的样品。



Answer 8:

进入这一行成蜂巢命令行界面:

insert overwrite directory '/data/test' row format delimited fields terminated by '\t' stored as textfile select * from testViewQuery;

testViewQuery -一些具体的看法



Answer 9:

  1. 创建外部表
  2. 将数据插入表
  3. 可选后来删除表,这不会删除该文件,因为它是一个外部表

例:

创建外部表存储在“/用户/ MYNAME / projectA_additionaData /”查询结果

CREATE EXTERNAL TABLE additionaData
(
     ID INT,
     latitude STRING,
     longitude STRING
)
COMMENT 'Additional Data gathered by joining of the identified cities with latitude and longitude data' 
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ',' STORED AS TEXTFILE location '/user/myName/projectA_additionaData/';

饲养查询结果到临时表

 insert into additionaData 
     Select T.ID, C.latitude, C.longitude 
     from TWITER  
     join CITY C on (T.location_name = C.location);

删除临时表

drop table additionaData


Answer 10:

要设置输出目录和输出文件格式多,请尝试以下操作:

INSERT OVERWRITE [LOCAL] DIRECTORY directory1
[ROW FORMAT row_format] [STORED AS file_format] 
SELECT ... FROM ...

例:

INSERT OVERWRITE DIRECTORY '/path/to/output/dir'
ROW FORMAT DELIMITED
STORED AS PARQUET
SELECT * FROM table WHERE id > 100;


文章来源: Hive query output to file
标签: hadoop hive