python read_fwf error: 'dtype is not supported

Using python 2.7.5 and pandas 0.12.0, I'm trying to import fixed-width-font text files into a DataFrame with 'pd.io.parsers.read_fwf()'. The values I'm importing are all numeric, but it's important that leading zeros be preserved, so I'd like to specify the dtype as string rather than int.

According to the documentation for this function, the dtype attribute is supported in read_fwf, but when I try to use it:

data= pd.io.parsers.read_fwf(file, colspecs = ([79,81], [87,90]), header = None, dtype = {0: np.str, 1: np.str})

I get the error:

ValueError: dtype is not supported with python-fwf parser

I've tried as many variations as I can think of for setting 'dtype = something', but all of them return the same message.

Any help would be much appreciated!

标签： python parsing pandas

3条回答

\"骚年 ilove

2楼-- · 2019-06-15 00:45

Instead of specifying dtypes, specify a converter for the column you want to keep as str, building on @TomAugspurger's example:

from io import StringIO
import pandas as pd
data = StringIO(u"""
121301234
121300123
121300012
""")

pd.read_fwf(data, colspecs=[(0,3),(4,8)], converters = {1: str})

Leads to

    \n Unnamed: 1
0  121       0123
1  121       0012
2  121       0001

Converters are a mapping from a column name or index to a function to convert the value in the cell (eg. int would convert them to integer, float to floats, etc)

0人赞添加讨论(0) 举报

甜甜的少女心

3楼-- · 2019-06-15 00:52

The documentation is probably incorrect there. I think the same base docstring is used for several readers. As for as a workaround, since you know the widths ahead of time, I think you can prepend the zeros after the fact.

With this file and widths [4, 5]

121301234
121300123
121300012

we get:

In [38]: df = pd.read_fwf('tst.fwf', widths=[4,5], header=None)

In [39]: df
Out[39]: 
      0     1
0  1213  1234
1  1213   123
2  1213    12

To fill in the missing zeros, would this work?

In [45]: df[1] = df[1].astype('str')

In [53]: df[1] = df[1].apply(lambda x: ''.join(['0'] * (5 - len(x))) + x)

In [54]: df
Out[54]: 
      0      1
0  1213  01234
1  1213  00123
2  1213  00012

The 5 in the lambda above comes from the correct width. You'd need to select out all the columns that need leading zeros and apply the function (with the correct width) to each.

0人赞添加讨论(0) 举报

Luminary・发光体

4楼-- · 2019-06-15 01:00

This will work fine after pandas 0.20.2 version.

from io import StringIO
import pandas as pd
import numpy as np
data = StringIO(u"""
121301234
121300123
121300012
""")
pd.read_fwf(data, colspecs=[(0,3),(4,8)], header = None, dtype = {0: np.str, 1: np.str})

Output:

     0     1
0  NaN   NaN
1  121  0123
2  121  0012
3  121  0001

0人赞添加讨论(0) 举报

python read_fwf error: 'dtype is not supported

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间