Pandas fails with correct data type while reading

I have a SAS dataset and when I run it I get the following output on SAS:

I also have the following Python code which gets the .sas7bdat file and displays the output, i.e. here the first five observations.

import pandas as pd
file_name = "cars.sas7bdat"
my_df = pd.read_sas(file_name)
my_df = my_df.head()
print(my_df)

As you can see, it doesn't work correct when it comes to integer data types. CYL and WGT variables are integers but are not displaying correctly if I use pandas' read_sas function.

Any idea what heck is going on with this?

标签： python pandas types sas

2条回答

Juvenile、少年°

2楼-- · 2019-08-07 12:23

Finally solved the issue. Well, that seems definitely pandas' bug. I used directly the .sas7bdat library by typing this(installing):

pip install sas7bdat

Then I run the following code:

import sas7bdat
from sas7bdat import *

file_name = file_path + "cars.sas7bdat"
foo = SAS7BDAT(file_name)
my_df = foo.to_data_frame()
my_df = my_df.head()
print(my_df)

After running the above code, I get the following output in Python:

So, I get the output with correct data types displayed.

Hope pandas developers find out a solutions for the mentioned bug above.

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2019-08-07 12:30

SAS represents all numbers as 64bit (8 byte) floating point numbers. But you can save disk space by telling it to store less than 8 bytes. The dataset you posted did this for CYL and WGT.

When SAS reads the dataset back from disk to use it sets the missing least significant bytes to binary zeros. Apparently read_sas didn't understand this and instead of setting the missing bytes to binary zeros it did something else. Hence the seemingly random data.

The first value of CYL is 8 which in IEEE floating point would be the hexcode

40 20 00 00 00 00 00 00

The value you displayed of 8.00046 would be this value instead.

40 20 00 06 07 80 FD C1

0人赞添加讨论(0) 举报

Pandas fails with correct data type while reading

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间