Is there a module for Python to open IBM SPSS (i.e. .sav) files? It would be great if there's something up-to-date which doesn't require any additional dll files/libraries.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
I had the same question as @Pyderman about how to update this for pandas (>0.16). This is what I came up with:
You could use a python interface to R and then import the data using
read.spss
inlibrary(foreign)
.But the benefit of using the IBM libraries is that they get this rather complex binary file format right. They are free, relieve you of the burden of writing code for this format, and the license permits you to redistribute them. What more could you ask?
I have released a python package "pyreadstat" that reads SPSS (sav, zsav and por), Stata and SAS files. It is a wrapper around the C library ReadStat so it is very fast. Readstat is the library used in the back of the R library Haven, which is widely used and very robust.
The package is autocontained. It does not require using R (no need to install an aditional application) and it does not depend on IBM dlls or other external libraries.
For example, in order to read a SPSS sav file you would do:
df is a pandas dataframe. Meta contains metadata such as variable labels or value labels. read_sav reads both sav and zsav (compressed) files. There is also a function read_por for old por (portable) files.
You can find it here: https://github.com/Roche/pyreadstat
Depending on what you want to do--process data using R-related commands from rpy2, or switch to Python--the solution provided by @Spacedman on a related thread might easily be adapted to suit your needs.
Otherwise, Pandas includes a convenient wrapper for
rpy2
. Here is an example of use with Peat and Barton'sweights.sav
data set:Here're packages you probably interested in
savReaderWriter on Bitbucket
savReaderWriter 3.4.2 in Python Package Index Repo