I've been hunting around for a solution to this question.
It appears to me that there is no way to embed reading and writing Parquet format in a Java program without pulling in dependencies on HDFS and Hadoop. Is this correct?
I want to read and write on a client machine, outside of a Hadoop cluster.
I started to get excited about Apache Drill, but it appears that it must run as a separate process. What I need is an in-process ability to read and write a file using the Parquet format.
You can write parquet format out side hadoop cluster using java Parquet Client API.
Here is a sample code in java which writes parquet format to local disk.
avro_format.json,
Hope this helps.