I have two Postgres 9.1 instances: one local, installed via Postgres.app on OS X, and one remote, on Heroku. I've ensured that lc_collate
is en_US.UTF-8
on both machines but am still seeing different behavior between the two.
On my local instance, SELECT 'i' > 'N'
returns t
whereas remotely it returns f
. Given that I've already checked lc_*
on both systems, what explains the difference I'm seeing?
From the point of view of Unicode, the case ordering is a customization.
Excerpt from http://www.unicode.org/reports/tr10:
Case Ordering. Some dictionaries and authors collate uppercase before
lowercase while others use the reverse, so that preference needs to be
customizable. Sometimes the case ordering is mandated by the
government, as in Denmark. Often it is simply a customization or user
preference.
Mac OS X simply has a different case ordering than the OS used by Heroku. On Mac OS X:
$ LC_CTYPE=en_US.UTF-8 sort << EOF
> i
> N
> EOF
produces:
N
i
The exact same command and same data on Ubuntu 12.04 produces:
i
N
This has none to do with PostgreSQL, except for the fact that it uses the OS for collation, so these unfortunate discrepancies between different OS impact databases.
PostgreSQL 10 and ICU
Starting with version 10, PostgreSQL may use collations provided by the ICU library, for servers compiled with ICU. These collations can sort consistently across operating systems.