I'm trying to import some data into my database. So I've created a temporary table,
create temporary table tmp(pc varchar(10), lat decimal(18,12), lon decimal(18,12), city varchar(100), prov varchar(2));
And now I'm trying to import the data,
copy tmp from '/home/mark/Desktop/Canada.csv' delimiter ',' csv
But then I get the error,
ERROR: invalid byte sequence for encoding "UTF8": 0xc92c
How do I fix that? Do I need to change the encoding of my entire database (if so, how?) or can I change just the encoding of my tmp
table? Or should I attempt to change the encoding of the file?
This error may occur if input data contain escape character itself. By default escape character is "\" symbol, so if your input text contain "\" character - try to change the default value using ESCAPE option.
Apparently I can just set the encoding on the fly,
And then re-run the query. Not sure what encoding I should be using though.
latin1
made the characters legible, but most of the accented characters were in upper-case where they shouldn't have been. I assumed this was due to a bad encoding, but I think its actually the data that was just bad. I ended up keeping the latin1 encoding, but pre-processing the data and fixed the casing issues.I had the same problem, and found a nice solution here: http://blog.e-shell.org/134
So I just recoded the dumpfile before playing it back:
In Debian or Ubuntu systems, recode can be installed via package.
If you are ok with discarding nonconvertible characters, you can use
-c
flagand then copy them to your table
This error means that records encoding in the file is different with respect to the connection. In this case iconv may return the error, sometimes even despite //IGNORE flag:
iconv -f ASCII -t utf-8//IGNORE < b.txt > /a.txt
iconv: illegal input sequence at position (some number)
The trick is to find incorrect characters and replace it. To do it on Linux use "vim" editor:
vim (your text file), press "ESC": button and type ":goto (number returned by iconv)"
To find non ASCII characters you may use the following command:
grep --color='auto' -P "[\x80-\xFF]"
If you remove incorrect characters please check if you really need to convert your file: probably the problem is already solved.
you can try this to handle UTF8 encoding.