Is there a way to see the contents of an orc file that hive 0.11 and above use. I usually cat gz files and decompress them to see the contents
eg: cat part-0000.gz | pigz -d | more
Note: pigz is a parallel gz program.
I would like to know if there is something similar to this for orc files.
The ORC file dump utility comes with hive (0.11 or higher):
hive --orcfiledump <hdfs-location-of-orc-file>
Source link
There is now also a native executable for Linux and MacOS that prints the contents of the orc file in JSON. See the ORC project (http://orc.apache.org/) and build the C++ tools.
% orc-contents examples/TestOrcFile.test1.orc
There is also a native metadata tool:
% orc-metadata ../examples/TestOrcFile.test1.orc
The ORC project also has a standalone uber jar that can do the same from Java.
% java -jar orc-tools-1.2.3-uber.jar data myfile.orc