invalid byte sequence for encoding “UTF8”

2019-01-03 22:40发布

I'm trying to import some data into my database. So I've created a temporary table,

create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));

And now I'm trying to import the data,

 copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv

But then I get the error,

ERROR:  invalid byte sequence for encoding "UTF8": 0xc92c

How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp table? Or should I attempt to change the encoding of the file?

16条回答
别忘想泡老子
2楼-- · 2019-01-03 23:27

This error may occur if input data contain escape character itself. By default escape character is "\" symbol, so if your input text contain "\" character - try to change the default value using ESCAPE option.

查看更多
戒情不戒烟
3楼-- · 2019-01-03 23:28

Apparently I can just set the encoding on the fly,

 set client_encoding to 'latin1'

And then re-run the query. Not sure what encoding I should be using though.


latin1 made the characters legible, but most of the accented characters were in upper-case where they shouldn't have been. I assumed this was due to a bad encoding, but I think its actually the data that was just bad. I ended up keeping the latin1 encoding, but pre-processing the data and fixed the casing issues.

查看更多
兄弟一词,经得起流年.
4楼-- · 2019-01-03 23:28

I had the same problem, and found a nice solution here: http://blog.e-shell.org/134

This is caused by a mismatch in your database encodings, surely because the database from where you got the SQL dump was encoded as SQL_ASCII while the new one is encoded as UTF8. .. Recode is a small tool from the GNU project that let you change on-the-fly the encoding of a given file.

So I just recoded the dumpfile before playing it back:

postgres> gunzip -c /var/backups/pgall_b1.zip | recode iso-8859-1..u8 | psql test

In Debian or Ubuntu systems, recode can be installed via package.

查看更多
beautiful°
5楼-- · 2019-01-03 23:29

If you are ok with discarding nonconvertible characters, you can use -c flag

iconv -c -t utf8 filename.csv > filename.utf8.csv

and then copy them to your table

查看更多
淡お忘
6楼-- · 2019-01-03 23:32

This error means that records encoding in the file is different with respect to the connection. In this case iconv may return the error, sometimes even despite //IGNORE flag:

iconv -f ASCII -t utf-8//IGNORE < b.txt > /a.txt

iconv: illegal input sequence at position (some number)

The trick is to find incorrect characters and replace it. To do it on Linux use "vim" editor:

vim (your text file), press "ESC": button and type ":goto (number returned by iconv)"

To find non ASCII characters you may use the following command:

grep --color='auto' -P "[\x80-\xFF]"

If you remove incorrect characters please check if you really need to convert your file: probably the problem is already solved.

查看更多
唯我独甜
7楼-- · 2019-01-03 23:33
copy tablename from 'filepath\filename' DELIMITERS '=' ENCODING 'WIN1252';

you can try this to handle UTF8 encoding.

查看更多
登录 后发表回答