Get mysqldump to dump data suitable for psql input

2020-07-18 06:29发布

I'm trying to port a database from MySQL to PostgreSQL. I've rebuilt the schema in Postgres, so all I need to do is get the data across, without recreating the tables.

I could do this with code that iterates all the records and inserts them one at a time, but I tried that and it's waaayyyy to slow for our database size, so I'm trying to use mysqldump and a pipe into psql instead (once per table, which I may parallelize once I get it working).

I've had to jump through various hoops to get this far, turning on and off various flags to get a dump that is vaguely sane. Again, this only dumps the INSERT INTO, since I've already prepared the empty schema to get the data into:

      /usr/bin/env \
      PGPASSWORD=mypassword \
      mysqldump \
      -h mysql-server \
      -u mysql-username \
      --password=mysql-password \
      mysql-database-name \
      table-name \
      --compatible=postgresql \
      --compact \
      -e -c -t \
      --default-character-set=utf8 \
      | sed "s/\\\\\\'/\\'\\'/g" \
      | psql \
      -h postgresql-server \
      --username=postgresql-username \
      postgresql-database-name

Everything except that ugly sed command is manageable. I'm doing that sed to try and convert MySQL's approach to quoting single-quotes inside of strings ('O\'Connor') o PostgreSQL's quoting requirements ('O''Connor'). It works, until there are strings like this in the dump: 'String ending with a backslash \\'... and yes, it seems there is some user input in our database that has this format, which is perfectly valid, but doesn't pass my sed command. I could add a lookbehind to the sed command, but I feel like I'm crawling into a rabbit hole. Is there a way to either:

a) Tell mysqldump to quote single quotes by doubling them up b) Tell psql to expect backslashes to be interpreted as quoting escapes?

I have another issue with BINARY and bytea differences, but I've worked around that with a base64 encoding/decoding phase.

EDIT | Looks like I can do (b) with set backslash_quote = on; set standard_conforming_strings = off;, though I'm not sure how to inject that into the start of the piped output.

3条回答
Melony?
2楼-- · 2020-07-18 06:37

The file psqlrc and ~/.psqlrc may contain SQL commands to be executed when the client starts. You can put these three lines, or any other settings you would like in that file.

SET standard_conforming_strings = 'off';
SET backslash_quote = 'on';
SET escape_string_warning = 'off';

These settings for psql combined with the following mysqldump command will successfully migrate data only from mysql 5.1 to postgresql 9.1 with UTF-8 text (Chinese in my case). This method may be the only reasonable way to migrate a large database if creating an intermediate file would be too large or too time consuming. This requires you manually migrate the schema, since the two database's data types are vastly different. Plan on typing out some DDL to get it right.

mysqldump \
--host=<hostname> \
--user=<username> \
--password=<password> \
--default-character-set=utf8 \
--compatible=postgresql \
--complete-insert \
--extended-insert \
--no-create-info \
--skip-quote-names \
--skip-comments \
--skip-lock-tables \
--skip-add-locks \
--verbose \
<database> <table> | psql -n -d <database>
查看更多
混吃等死
3楼-- · 2020-07-18 06:49

Dump the tables to TSV using mysqldump's --tab option and then import using psql's COPY method.

查看更多
唯我独甜
4楼-- · 2020-07-18 06:56

Try this:

sed -e "s/\\\\'/\\\\\\'/g" -e "s/\([^\\]\)\\\\'/\1\\'\\'/g"

Yeah, "Leaning Toothpick Syndrome", I know.

查看更多
登录 后发表回答