How to convert a db in postgreSQL to utf8?

2019-03-25 06:30发布

问题:

I've just got a db in postgreSQL for my project and just realized it's in SQL_ASCII encoding, which means "no encoding" I think.

So what is the simplest way to convert this to utf8? And I know the db should be in latin1, does the conversion will damage the content?

Thanks!

回答1:

Converting to UTF8 should not damage your data as (I believe) all characters in SQL_ASCII also exist in utf8; they just have different byte codes.

Your best bet is to re-build your database. That is dump it, create a utf8 database then restore the dump to that new database.

postgres pg_dump --encoding utf8 main -f main.sql
createdb -E utf8 newMain
psql -f main.sql -d newMain

You can then of course rename the databases once you are happy that the new UTF8 one matches your data.



回答2:

I resolved using these commands;

1-) Export

pg_dump --username=postgres --encoding=ISO88591 database -f database.sql

and after

2-) Import

psql -U postgres -d database < database.sql

these commands helped me solve the problem of conversion SQL_ASCII - UTF-8



回答3:

UTF-8 conversion is all about what kind of characters where saved in the non UTF-8 db: depending on the data the proposed solution may fail. I managed to convert mine following this tutorial, using recode (a small tool from the GNU project that let you change on-the-fly the encoding of a given file) and I came up with this:

pg_dump -v --encoding utf8 -Fc -Z9 -c -f origindb.sql.bin iso8859-1-db

pg_restore origindb.sql.bin | recode iso-8859-1..u8 | psql --dbname utf8converteddb


回答4:

I searched the entire internet looking for a solution to this issue and Koyots solution above worked first time after wasting countless hours trying everything to migrate an old SQL_ASCII database to a new UTF8 database

To expand upon the solution...

  • I first redirected all websites to a maintenance page
  • Renamed the database by appending "_ascii" to it's name just to be sure nothing could connect to it and also so I know after that this was the original database !!
  • Created a new utf8 database with "_utf8" appended to the name (append TEMPLATE=template0 to the CREATE DATABASE STATEMENT)
  • Backed up the ascii database
  • Restored the backup to the new utf8 database
  • Renamed the utf8 database back to what I had named it before
  • Check database total size is roughly the same size as the original database. Won't match exactly due to dead tuples etc. New database should be smaller based on fill factor etc.
  • Turn off website redirects
  • Test all websites

I'd suggest keeping both databases for a couple of weeks until you are sure you have not lost any data (provided you can spare the disk space)