So, i've recently come to realize that collation is a huge deal on postgres, and that many comments refer to OSX / locale support as "broken", which hasn't enlightened me. for the purposes of this question, i'm ignoring the table/column default aspects of collation, and specifying it explicitly.
- my laptop is osx with postgres 9.2.4
- my server is ubuntu with postgres 9.1.9
common to both:
# show lc_collate ;
en_US.UTF-8
# show lc_ctype ;
en_US.UTF-8
on my laptop:
select ',' < '-' collate "en_US.UTF-8" as result;
true
now, my server does not have collation "en_US.UTF-8", but it does have "en_US.utf8" (which i recognize is not the same thing, though i would expect it to behave the same)
select ',' < '-' collate "en_US.utf8" as result;
false
so, here's where i'm freaking out. "C" order would always say (for both machines) that ',' is less than '-', which my brain would agree with.
which utf8 implementation is correct? and if someone could point me at the definition that would help, as mostly i've only been able to find accusations of "broken" leveled at osx. So i'd be worried that i've been wrong my entire life thinking that comma orders before hyphen, but enter a reasonably reliant arbiter of text and unicode etc. python. which on the ubuntu server yields:
>>> print u',' < u'-', ',' < '-'
True True
So, I'm feeling a lot like this collation concept is more broken on my ubuntu server than my osx server. but i don't have a "proper" collation to create my "en_US.UTF-8" collation from ala "create collation", so i'm lost as to how to create parity, or which answer (true/false) i should be using as the correct reference. (besides personally siding with ascii order for what are, after all, ascii characters).
so, in a nutshell, which is the proper answer for en_US.UTF-8 ?