how to get input file name as column within hive q

2019-03-15 07:15发布

I have a hive external tables that mapped to some directory. This directory includes a several files.

I want to run query like find file name where there is a user "abc"

 select file_name , usr from usrs_tables where usr = "abc"

But of course the data doesn't includes file name inside.

In MapReduce I can do it by

FileSplit fileSplit = (FileSplit)context.getInputSplit();
String filename = fileSplit.getPath().getName();
System.out.println("File name "+filename);
System.out.println("Directory and File name"+fileSplit.getPath().toString());

How can I do it in Hive?

标签: hadoop hive
3条回答
三岁会撩人
2楼-- · 2019-03-15 07:35

Every table in Hive has two virtual columns. They are

  1. INPUT__FILE__NAME
  2. BLOCK__OFFSET__INSIDE__FILE

INPUT__FILE__NAME gives the name of the file. BLOCK__OFFSET__INSIDE__FILE is the current global file position. Suppose if we want to find the name of the file corresponding to each record in a file. We can use the INPUT__FILE__NAME column. This feature is available from Hive versions above 0.8. A small example is given below.

Query

select INPUT__FILE__NAME, name from customer_data;

This will give us the file name corresponding to each record. If you want to get the file names corresponding to a hive table, the below query will help you.

select distinct(INPUT__FILE__NAME) from customer_data;
查看更多
The star\"
3楼-- · 2019-03-15 07:45

Yes, you can retrieve the file the record was found in using the virtual column named INPUT__FILE__NAME, for example:

select INPUT__FILE__NAME, id, name from users where ...;

yields something like:

hdfs://localhost.localdomain:8020/user/hive/warehouse/users/users1.txt    2    user2
hdfs://localhost.localdomain:8020/user/hive/warehouse/users/users2.txt    42    john.doe

If necessary, use the provided string functions to trim the host and directories from the uri.

You can find the documentation on virtual columns here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

查看更多
登录 后发表回答