Pandas writing dataframe to CSV file

2019-01-03 00:45发布

I have a dataframe in pandas which I would like to write to a CSV file. I am doing this using:

df.to_csv('out.csv')

And getting the error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b1' in position 20: ordinal not in range(128)

Is there any way to get around this easily (i.e. I have unicode characters in my data frame)? And is there a way to write to a tab delimited file instead of a CSV using e.g. a 'to-tab' method (that I dont think exists)?

7条回答
仙女界的扛把子
2楼-- · 2019-01-03 00:59

If you don't want the index.

 df.to_csv("out.csv", index=False)
查看更多
够拽才男人
3楼-- · 2019-01-03 01:05

To delimit by a tab you can use the sep argument of to_csv:

df.to_csv(file_name, sep='\t')

To use a specific encoding (e.g. 'utf-8') use the encoding argument:

df.to_csv(file_name, sep='\t', encoding='utf-8')
查看更多
Deceive 欺骗
4楼-- · 2019-01-03 01:07

Sometimes you face these problems if you specify UTF-8 encoding also. I recommend you to specify encoding while reading file and same encoding while writing to file. This might solve your problem.

查看更多
一夜七次
5楼-- · 2019-01-03 01:07
df.to_csv('out.csv', sep=',')

It will work definitely.

Change df to the name of your dataframe name and run.

Use anaconda idle.

查看更多
手持菜刀,她持情操
6楼-- · 2019-01-03 01:08

I'd like to add something to what Andy Hayden already mentioned in his answer.

When you are storing a DataFrame object into a csv file using the to_csv method, you probably wont be needing to store the preceding indices of each row of the DataFrame object.

You can avoid that by passing a False boolean value to index parameter.

Somewhat like:

df.to_csv(file_name, encoding='utf-8', index=False)

So if your DataFrame object is something like:

  Color  Number
0   red     22
1  blue     10

The csv file will store:

Color,Number
red,22
blue,10

instead of (the case when the default value True was passed)

,Color,Number
0,red,22
1,blue,10

Found it worth sharing, Cheers! :-)

查看更多
走好不送
7楼-- · 2019-01-03 01:16

Something else you can try if you are having issues encoding to 'utf-8' and want to go cell by cell you could try the following.

Python 2

(Where "df" is your DataFrame object.)

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = unicode(x.encode('utf-8','ignore'),errors ='ignore') if type(x) == unicode else unicode(str(x),errors='ignore')
            df.set_value(idx,column,x)
        except Exception:
            print 'encoding error: {0} {1}'.format(idx,column)
            df.set_value(idx,column,'')
            continue

Then try:

df.to_csv(file_name)

You can check the encoding of the columns by:

for column in df.columns:
    print '{0} {1}'.format(str(type(df[column][0])),str(column))

Warning: errors='ignore' will just omit the character e.g.

IN: unicode('Regenexx\xae',errors='ignore')
OUT: u'Regenexx'

Python 3

for column in df.columns:
    for idx in df[column].index:
        x = df.get_value(idx,column)
        try:
            x = x if type(x) == str else str(x).encode('utf-8','ignore').decode('utf-8','ignore')
            df.set_value(idx,column,x)
        except Exception:
            print('encoding error: {0} {1}'.format(idx,column))
            df.set_value(idx,column,'')
            continue
查看更多
登录 后发表回答