Gradually migrate from SQL Server to PostgreSQL

2020-06-06 01:44发布

Company has numerous applications running on SQL Server. The database is a bit of a mess.

The goal is to gradually move from SQL Server to PostgreSQL (another SQL Server instance is not an option)

An ideal scenario would be if new applications could connect to PostgreSQL, create a new table structure, but still be able to use/interact with data from legacy SQL Server (an app connecting to two database servers is not an option).

Foreign data wrappers do not seem to be an option, since the technology is very immature and in case of PostgreSQL, the foreign tables are read-only.

Another wild idea is to connect from SQL Server instance to PostgreSQL, new applications would connect to SQL Server, but use PostgreSQL's foreign database. That foreign database ( I guess ) would have access to host's database objects. And at one point developers would switch all the new apps from SQL Server to PostgreSQL.

And of course there is the possibility to try and sync the data.

Which would be the best option?

3条回答
别忘想泡老子
2楼-- · 2020-06-06 02:21

Funnily enough, the company I work for has made exactly the same migration (actually, we're still phasing out the last few MS SQL parts). The basic approach we've taken is to separate the database functionality into separate areas or applications.

  • Any completely new or heavily rewritten application is entirely in Postgres. This doesn't necessarily mean the application layer (in our case PHP) only connects to Postgres, as whole libraries or shared modules may remain on "legacy" schemas.
  • Central business data, such as core configuration, initially remained in MS SQL, with a script regularly exporting the data and importing it into read-only Postgres targets. We used a simple XML serialization for this, after finding the seemingly simpler option of CSV / TSV too fiddly to convert between the two platforms. We also had problems performing the process in reverse, as the import process was more prone to disruptive exclusive locks on MS SQL than Postgres.
  • Data which is only written to in one place (e.g. an admin panel) can be simultaneously inserted / updated in both old and new DBs. Obviously this carries a risk of someone manually creating inconsistencies, but has the benefit that both copies are equally up to date. It also requires care with auto-generated values, e.g. using SET IDENTITY_INSERT to force in matching IDs.

Converting individual queries is relatively easy, with the main problem being with CamelCase table and column names: SQL Server is case insensitive but case preserving, whereas Postgres is case sensitive but folds unquoted identifiers to lowercase. So SELECT FooID FROM ... not only looks for a column named fooid, but returns a field labelled fooid to the application, which will be expecting FooID. This required auditing a large amount of existing application code so that it would expect an underscore_separated version, e.g. foo_id, which is more in keeping with Postgres's behaviour.

查看更多
▲ chillily
3楼-- · 2020-06-06 02:28

Everything you suggest is a recipe for pain and failed migrations. People will rant and rave about how awful, slow and unreliable PostgreSQL is if you try to use this approach. It'd be a great political move by someone who wanted to keep SQL Server, but not a good way to migrate to PostgreSQL.

There's a read/write foreign data wrapper coming for newer Pg versions, but it'll initially only support other PostgreSQL servers. Supporting MS SQL would be a lot harder due to the need to translate sqlstates and error messages, search conditions, and more, so any wrapper would no doubt be quite limited and have less than great performance. As you say, FDW support is too immature at this point anyway.

There are just so many things you lose by trying to do a hybrid like this:

  • No foreign key integrity enforcement

  • Data types on each side might not behave 100% the same so data could be OK on one side and not on the other. Think timestamps/dates.

  • Efficient joins would require an extremely sophisticated foreign data wrapper - so what'll usually happen is that the whole table will get fetched then joined against locally. Performance will be terrible.

  • Writing queries becomes a nightmare when you're doing anything but the most trivial task. Function names differ, etc.

  • You lose or weaken many ACID properties and/or must use two phase commit, which sucks for performance.

Seriously, don't do this.

Syncing the DBs is probably even worse - unless it's one way, it's going to be a recipe for lost updates, deleted rows reappearing, and worse. Two-way sync is extremely hard.

Start preparing your apps for a move by making them able to run on both servers, but only one at a time. Once you've got the app ready to run on Pg, start doing some load testing and reliability testing with a migrated copy of the live data. then think about migrating, but have plans for how to reverse the move if you find last minute problems that force you to delay.

If you're adding entirely new parts to the app it might be reasonable to have them in Pg if they don't interact with the other data in the DB at all. That's pretty unlikely, though, and your sysadmins will still hate you when you tell them that you now need an atomic snapshot across two separate databases...

查看更多
唯我独甜
4楼-- · 2020-06-06 02:37

It's not a problem at all. You can move your data fully or partially to PostgreSQL. You can write stored functions inside of PostgreSQL in Java, Python or some other supported language and create views that use these functions. Your function have to connect to MSSQL on each execution. The views names and structure must represent your MSSQL tables in different databases. Only update in this case is little bit tricky, requires triggers and more code. In such way you can connect your PostgreSQL to any other SQL/NoSQL DB vendor. It works good, but slower than all your data inside PostgreSQL only. I believe that in some cases to connect to both vendors from application might be more simple, but it's your choice: you have options.

查看更多
登录 后发表回答