invalid byte sequence for encoding “UTF8”

2019-01-03 22:40发布

I'm trying to import some data into my database. So I've created a temporary table,

create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));

And now I'm trying to import the data,

 copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv

But then I get the error,

ERROR:  invalid byte sequence for encoding "UTF8": 0xc92c

How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp table? Or should I attempt to change the encoding of the file?

16条回答
Ridiculous、
2楼-- · 2019-01-03 23:15

It depends on what type of machine/encoding generated your import file.

If you're getting it from an English or Western European version of Windows, your best bet is probably setting it to 'WIN1252'. If you are getting it from a different source, consult the list of character encodings here:

http://www.postgresql.org/docs/8.3/static/multibyte.html

If you're getting it from a Mac, you may have to run it through the "iconv" utility first to convert it from MacRoman to UTF-8.

查看更多
Explosion°爆炸
3楼-- · 2019-01-03 23:15

It is also very possible with this error that the field is encrypted in place. Be sure you are looking at the right table, in some cases administrators will create an unencrypted view that you can use instead. I recently encountered a very similar issue.

查看更多
疯言疯语
4楼-- · 2019-01-03 23:16
psql=# copy tmp from '/path/to/file.csv' with delimiter ',' csv header encoding 'windows-1251';

Adding encoding option worked in my case.

查看更多
混吃等死
5楼-- · 2019-01-03 23:21

Open file CSV by Notepad++ . Choose menu Encoding \ Encoding in UTF-8, then fix few cell manuallly.

Then try import again.

查看更多
贼婆χ
6楼-- · 2019-01-03 23:23

If you need to store UTF8 data in your database, you need a database that accepts UTF8. You can check the encoding of your database in pgAdmin. Just right-click the database, and select "Properties".

But that error seems to be telling you there's some invalid UTF8 data in your source file. That means that the copy utility has detected or guessed that you're feeding it a UTF8 file.

If you're running under some variant of Unix, you can check the encoding (more or less) with the file utility.

$ file yourfilename
yourfilename: UTF-8 Unicode English text

(I think that will work on Macs in the terminal, too.) Not sure how to do that under Windows.

If you use that same utility on a file that came from Windows systems (that is, a file that's not encoded in UTF8), it will probably show something like this:

$ file yourfilename
yourfilename: ASCII text, with CRLF line terminators

If things stay weird, you might try to convert your input data to a known encoding, to change your client's encoding, or both. (We're really stretching the limits of my knowledge about encodings.)

You can use the iconv utility to change encoding of the input data.

iconv -f original_charset -t utf-8 originalfile > newfile

You can change psql (the client) encoding following the instructions on Character Set Support. On that page, search for the phrase "To enable automatic character set conversion".

查看更多
forever°为你锁心
7楼-- · 2019-01-03 23:23

Well I was facing the same problem. And what solved my problem is this:

In excel click on Save as. From save as type, choose .csv Click on Tools. Then choose web options from drop down list. Under Encoding tab, save the document as Unicode(UTF-8). Click OK. Save the file. DONE !

查看更多
登录 后发表回答