Using pandas 18.1...
I'm trying to iterate through a folder of CSVs to read each CSV and send it to an Oracle database table. There is a non-ascii character lurking in one of my many CSVs (more like reveling in my anguish). I keep getting this error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xab' in position 8:
ordinal not in range(128)
Here's the code:
import pandas as pd
import pandas.io.sql as psql
from sqlalchemy import create_engine
import cx_Oracle as cx
engine = create_engine('oracle+cx_oracle://schema:'+pwd+'@server:port/service_name'
,encoding='latin1')
name='table'
path=r'path_to_folder'
filelist = os.listdir(path)
for file in filelist:
df = pd.read_csv(pathc+'\\'+file,encoding='latin1',index_col=0)
df=df.astype('unicode')
df['date'] = pd.to_datetime(df['date'])
df['date'] = pd.to_datetime(df['Contract_EffDt'],format='%YYYY-%mm-%dd')
df.to_sql(name, engine, if_exists = 'append')
I've tried the following:
- encoding=utf-8 (in the engine, if I do that in read_csv, it throws an error)
- Adding ?encoding=utf8 after "service_name" in the engine
- Using df=df.astype('unicode') (and not)
What I want to do: Replace the unreadable character with something else and, most importantly, proceed with sending data to Oracle.
Note:
The data file I'm using are from the cms.gov site. Here's a zip file with an example. I'm using the "contracts_info" file.
Thanks in advance!