Python sas7bdat module usage

2020-07-07 05:45发布

问题:

I have to dump data from SAS datasets. I found a Python module called sas7bdat.py that says it can read SAS .sas7bdat datasets, and I think it would be simpler and more straightforward to do the project in Python rather than SAS due to the other functionality required. However, the help(sas7bdat) in interactive Python is not very useful and the only example I was able to find to dump a dataset is as follows:

import sas7bdat
from sas7bdat import *
# following line is sas dataset to convert
foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
#following line is txt file to create
foo.convertFile('/support/textfiles/locked_data.txt','\t')

This doesn't do what I want because a) it uses the SAS variable names as column headers and I need it to use the variable labels, and b) it uses "nan" to denote missing numeric values where I'd rather just leave the value blank.

Can anyone point me to some useful documentation on the methods included in sas7bdat.py? I've Googled every permutation of key words that I could think of, with no luck. If not, can someone give me an example or two of using readColumnAttributes(), readColumnLabels(), and/or readColumnNames()?

Thanks, all.

回答1:

This is only a partial answer as I've found no [easy to read] concrete documentation.

You can view the source code here

This shows some basic info regarding what arguments the methods require, such as:

  • readColumnAttributes(self, colattr)
  • readColumnLabels(self, collabs, coltext, colcount)
  • readColumnNames(self, colname, coltext)

I think most of what you are after is stored in the "header" class returned when creating an object with SAS7BDAT. If you just print that class you'll get a lot of info, but you can also access class attributes as well. I think most of what you may be looking for would be under foo.header.cols. I suspect you use various header attributes as parameters for the methods you mention.

Maybe something like this will get you closer?

from sas7bdat import SAS7BDAT
foo = SAS7BDAT(inFile) #your file here...

for i in foo.header.cols:
    print '"Atrributes"', i.attr
    print '"Labels"', i.label
    print '"Name"', i.name

edit: Unrelated to this specific question, but the type() and dir() commands come in handy when trying to figure out what is going on in an unfamiliar class/library



回答2:

I know I'm late for the answer, but in case someone searches for similar question. The best option is:

import sas7bdat
from sas7bdat import *
foo = SAS7BDAT('/support/sas/locked_data.sas7bdat')
# This converts to dataframe:
ds = foo.to_data_frame()


回答3:

As time passes, solutions become easier. I think this one is easiest if you want to work with pandas:

import pandas as pd
df = pd.read_sas('/support/sas/locked_data.sas7bdat')

Note that it is easy to get a numpy array by using df.values



回答4:

Personally I think the better approach would be to export the data using SAS then process the external file as needed using Python.

In SAS, you can do this...

libname datalib "/support/sas";
filename sasdump "/support/textfiles/locked_data.txt";

proc export
    data = datalib.locked_data
    outfile = sasdump
    dbms = tab
    label
    replace;
run;

The downside to this is that while the column labels are used rather than the variable names, the labels are enclosed in double quotes. When processing in Python, you may need to programmatically remove them if they cause a problem. I hope that helps even though it doesn't use Python like you wanted.



标签: python sas