I am trying to use fastparquet to open a file, but I get the error:
RuntimeError: Decompression 'SNAPPY' not available. Options: ['GZIP', 'UNCOMPRESSED']
I have the following installed and have rebooted my interpreter:
python 3.6.5 hc3d631a_2
python-snappy 0.5.2 py36_0 conda-forge
snappy 1.1.7 hbae5bb6_3
fastparquet 0.1.5 py36_0 conda-forge
Everything downloaded smoothly. I didn't know if I needed snappy or python-snappy so I got one had no fix and got the other, still with no success. All related issues I have found are fixed when downloading snappy, but I am still getting this error with having two snappys! Any help would be appreciated.
Run:
pip install python-snappy
pip install pyarrow
It should do the trick.
I think you lack the pyarrow
package.
If you have an error with pip
, use conda
instead (i.e., conda install python-snappy
or if you still have errors conda install -c conda-forge python-snappy
).
You need to install python-snappy
as stated by the response of Catbuilts. However, it is only a wrapper around the snappy implementation in c that should be installed in your computer, this issue has been addressed in this answer about installing snappy-c.
Assuming you have a DEB-based system, such as ubuntu, you can get it with:
sudo apt-get install libsnappy-dev
python3 -m pip install --user python-snappy
To test it, you can try the following script:
import pandas as pd
import snappy # Not required but snappy (python-snappy) module should be reachable
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
write("/tmp/deleteme", df, compression="SNAPPY")
df_parquet = ParquetFile("/tmp/deleteme").to_pandas()
df_parquet.head()