Hadoop - Load Hive tables using PIG

2019-08-31 11:56发布

I want to load Hive tables using Pig. I think we can do this through HCatLoader but I am using xml files to load pig. For this, I have to use XMLLoader. Can I use two options to load XML files in Pig.

I am extracting data from XML files using my own UDF and once we extract all the data, I have to load Pig data in Hive tables.

I can't use HIVE to extract the XML data as the XML I received is quite complex and I wrote my own UDF to parse the XML. Any suggestions or pointers how we can load Hive tables using PIG data.

I am using AWS.

2条回答
一纸荒年 Trace。
2楼-- · 2019-08-31 12:10

You can store data from pig into Hive tables using HCatStorer. For example:

register 's3n://bucket/path/xmlUDF.jar'
xml = LOAD 's3n://bucket/pathtofiles' USING xmlUDF();
STORE xml INTO 'database.table' USING org.apache.hive.hcatalog.pig.HCatStorer();

Your question isn't quite clear. Are you hoping to work with the XML and Hive data within pig, do something, and then store the result in Hive? Just trying to store the XML data in Hive and work with it there?

查看更多
Ridiculous、
3楼-- · 2019-08-31 12:11

You can STORE the loaded data into text file using delimiters (may be comma) and then create an external table in hive pointing to your file location.

Create external table YOURTABLE (schema)
row format delimited
fields terminated by ','
location '/your/file/directory';
查看更多
登录 后发表回答