Using python 2.7.5 and pandas 0.12.0, I'm trying to import fixed-width-font text files into a DataFrame with 'pd.io.parsers.read_fwf()'. The values I'm importing are all numeric, but it's important that leading zeros be preserved, so I'd like to specify the dtype as string rather than int.
According to the documentation for this function, the dtype attribute is supported in read_fwf, but when I try to use it:
data= pd.io.parsers.read_fwf(file, colspecs = ([79,81], [87,90]), header = None, dtype = {0: np.str, 1: np.str})
I get the error:
ValueError: dtype is not supported with python-fwf parser
I've tried as many variations as I can think of for setting 'dtype = something', but all of them return the same message.
Any help would be much appreciated!
Instead of specifying dtypes, specify a converter for the column you want to keep as str, building on @TomAugspurger's example:
Leads to
Converters are a mapping from a column name or index to a function to convert the value in the cell (eg. int would convert them to integer, float to floats, etc)
The documentation is probably incorrect there. I think the same base docstring is used for several readers. As for as a workaround, since you know the widths ahead of time, I think you can prepend the zeros after the fact.
With this file and widths [4, 5]
we get:
To fill in the missing zeros, would this work?
The 5 in the lambda above comes from the correct width. You'd need to select out all the columns that need leading zeros and apply the function (with the correct width) to each.
This will work fine after pandas 0.20.2 version.
Output: