Character with encoding UTF8 has no equivalent in

2020-02-23 05:03发布

问题:

I am getting the following exception:

Caused by: org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding "UTF8" has no equivalent in "WIN1252"

Is there a way to eradicate such characters, either via SQL or programmatically?
(SQL solution should be preferred).

I was thinking of connecting to the DB using WIN1252, but it will give the same problem.

回答1:

What do you do when you get this message? Do you import a file to Postgres? As devstuff said it is a BOM character. This is a character Windows writes as first to a text file, when it is saved in UTF8 encoding - it is invisible, 0-width character, so you'll not see it when opening it in a text editor.

Try to open this file in for example Notepad, save-as it in ANSI encoding and add (or replace similar) set client_encoding to 'WIN1252' line in your file.



回答2:

I had a similar issue, and I solved by setting the encoding to UTF8 with \encoding UTF8 in the client before attempting an INSERT INTO foo (SELECT * from bar WHERE x=y);. My client was using WIN1252 encoding but the database was in UTF8, hence the error.

More info is available on the PostgreSQL wiki under Character Set Support (devel docs).



回答3:

Don't eridicate the characters, they're real and used for good reasons. Instead, eridicate Win1252.



回答4:

I had a very similar issue. I had a linked server from SQL Server to a PostgreSQL database. Some data I had in the table I was selecting from using an openquery statement had some character that didn't have an equivalent in Win1252. The problem was that the System DSN entry (to be found under the ODBC Data Source Administrator) I had used for the connection was configured to use PostgreSQL ANSI(x64) rather than PostgreSQL Unicode(x64). Creating a new data source with the Unicode support and creating a new modified linked server and refernecing the new linked server in your openquery resolved the issue for me. Happy days.



回答5:

That looks like the byte sequence 0xBD, 0xBF, 0xEF as a little-endian integer. This is the UTF8-encoded form of the Unicode byte-order-mark (BOM) character 0xFEFF.

I'm not sure what Postgre's normal behaviour is, but the BOM is normally used only for encoding detection at the beginning of an input stream, and is usually not returned as part of the result.

In any case, your exception is due to this code point not having a mapping in the Win1252 code page. This will occur with most other non-Latin characters too, such as those used in Asian scripts.

Can you change the database encoding to be UTF8 instead of 1252? This will allow your columns to contain almost any character.



回答6:

I was able to get around it by using Postgres' substring function and selecting that instead:

select substring(comments from 1 for 200) from billing

The comment that the special character started each field was a great help in finally resolving it.



回答7:

This problem appeared for us around 19/11/2016 with our old Access 97 app accessing a postgresql 9.1 DB.

This was solved by changing the driver to UNICODE instead of ANSI (see plang comment).



回答8:

Here's what worked for me : 1 enable ad-hoc queries in sp_configure. 2 add ODBC DSN for your linked PostgreSQL server. 3 make sure you have both ANSI and Unicode (x64) drivers (try with both). 4 run query like this below - change UID, server ip, db name and password. 5 just keep the query in last line in postgreSQL format.

EXEC sp_configure 'show advanced options', 1
RECONFIGURE
GO
EXEC sp_configure 'ad hoc distributed queries', 1
RECONFIGURE
GO

SELECT * FROM OPENROWSET('MSDASQL', 
'Driver=PostgreSQL Unicode(x64); 
uid=loginid;
Server=1.2.3.41;
port=5432;
database=dbname;
pwd=password',

'select * FROM table_name limit 10;')