Decompression 'SNAPPY' not available with

2020-03-01 08:31发布

问题:

I am trying to use fastparquet to open a file, but I get the error:

RuntimeError: Decompression 'SNAPPY' not available.  Options: ['GZIP', 'UNCOMPRESSED']

I have the following installed and have rebooted my interpreter:

python                    3.6.5                hc3d631a_2  
python-snappy             0.5.2                    py36_0    conda-forge
snappy                    1.1.7                hbae5bb6_3  
fastparquet               0.1.5                    py36_0    conda-forge

Everything downloaded smoothly. I didn't know if I needed snappy or python-snappy so I got one had no fix and got the other, still with no success. All related issues I have found are fixed when downloading snappy, but I am still getting this error with having two snappys! Any help would be appreciated.

回答1:

Run:

pip install python-snappy
pip install pyarrow 

It should do the trick.

I think you lack the pyarrow package.

If you have an error with pip, use conda instead (i.e., conda install python-snappy or if you still have errors conda install -c conda-forge python-snappy).



回答2:

You need to install python-snappy as stated by the response of Catbuilts. However, it is only a wrapper around the snappy implementation in c that should be installed in your computer, this issue has been addressed in this answer about installing snappy-c.

Assuming you have a DEB-based system, such as ubuntu, you can get it with:

sudo apt-get install libsnappy-dev
python3 -m pip install --user python-snappy

To test it, you can try the following script:

import pandas as pd
import snappy  # Not required but snappy (python-snappy) module should be reachable
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
# df.head() # Test your initial value
write("/tmp/deleteme", df, compression="SNAPPY")
df_parquet = ParquetFile("/tmp/deleteme").to_pandas()
df_parquet.head()