I have a folder with a bunch of dbf files I would like to convert to csv. I have tried using a code to just change the extension from .dbf to .csv, and these files open fine when I use Excel, but when I open them in pandas they look like this:
s\t�
0 NaN
1 1 176 1.58400000000e+005-3.385...
This is not what I want, and those characters don't appear in the real file.
How should I read in the dbf file correctly?
Looking online, there's a few options:
With simpledbf:
Tweaked from the gist:
Here is my solution that I've been using for years. I have a solution for Python 2.7 and one for Python 3.5 (probably also 3.6).
Python 2.7:
Python 3.5:
You can get dbfpy and dbfread from pip install.
EDIT#2:
It's possible to read a dbf file, line by line and without conversion into csv, with
dbfread
(simply install withpip install dbfread
):My updated references:
official project site: http://pandas.pydata.org
official documentation: http://pandas-docs.github.io/pandas-docs-travis/
dbfread
: https://pypi.python.org/pypi/dbfread/2.0.6geopandas
: http://geopandas.org/shp and dbf with
geopandas
: https://gis.stackexchange.com/questions/129414/only-read-specific-attribute-columns-of-a-shapefile-with-geopandas-fionaUsing my dbf library you could do something like:
which will create a
.csv
file of the same name as each dbf file. If you put that code into a script nameddbf2csv.py
you could then call it as