Ordering differences between Postgres instances on

2019-05-15 14:14发布

问题:

I have two Postgres 9.1 instances: one local, installed via Postgres.app on OS X, and one remote, on Heroku. I've ensured that lc_collate is en_US.UTF-8 on both machines but am still seeing different behavior between the two.

On my local instance, SELECT 'i' > 'N' returns t whereas remotely it returns f. Given that I've already checked lc_* on both systems, what explains the difference I'm seeing?

回答1:

From the point of view of Unicode, the case ordering is a customization. Excerpt from http://www.unicode.org/reports/tr10:

Case Ordering. Some dictionaries and authors collate uppercase before lowercase while others use the reverse, so that preference needs to be customizable. Sometimes the case ordering is mandated by the government, as in Denmark. Often it is simply a customization or user preference.

Mac OS X simply has a different case ordering than the OS used by Heroku. On Mac OS X:

$ LC_CTYPE=en_US.UTF-8 sort << EOF
> i
> N
> EOF

produces:

N
i

The exact same command and same data on Ubuntu 12.04 produces:

i
N

This has none to do with PostgreSQL, except for the fact that it uses the OS for collation, so these unfortunate discrepancies between different OS impact databases.

PostgreSQL 10 and ICU

Starting with version 10, PostgreSQL may use collations provided by the ICU library, for servers compiled with ICU. These collations can sort consistently across operating systems.