In Microsoft SQL Server, it's possible to specify an "accent insensitive" collation (for a database, table or column), which means that it's possible for a query like
SELECT * FROM users WHERE name LIKE 'João'
to find a row with a Joao
name.
I know that it's possible to strip accents from strings in PostgreSQL using the unaccent_string contrib function, but I'm wondering if PostgreSQL supports these "accent insensitive" collations so the SELECT
above would work.
Use the unaccent module for that - which is completely different from what you are linking to.
Install once per database with:
If you get an error like:
Install the contrib package on your database server like instructed in this related answer:
Among other things, it provides the function
unaccent()
you can use with your example (whereLIKE
seems not needed).Index
To use an index for that kind of query, create an index on the expression. However, Postgres only accepts
IMMUTABLE
functions for indexes. If a function can return a different result for the same input, the index could silently break.unaccent()
onlySTABLE
notIMMUTABLE
Unfortunately,
unaccent()
is onlySTABLE
, notIMMUTABLE
. According to this thread on pgsql-bugs, this is due to three reasons:search_path
, which can change easily.Some tutorials on the web instruct to just alter the function volatility to
IMMUTABLE
. This brute-force method can break under certain conditions.Others suggest a simple
IMMUTABLE
wrapper function (like I did myself in the past).There is an ongoing debate whether to make the variant with two parameters
IMMUTABLE
which declares the used dictionary explicitly. Read here or here.Another alternative would be this module with an IMMUTABLE
unaccent()
function by Musicbrainz, provided on Github. Haven't tested it myself. I think I have come up with a better idea:Best for now
I propose an approach that is at least as efficient as other solutions floating around, but safer: Create a wrapper function with the two-parameter form and "hard-wire" the schema for function and dictionary:
public
being the schema where you installed the extension (public
is the default).Previously, I had added
SET search_path = public, pg_temp
to the function - until I discovered that the dictionary can be schema-qualified, too, which is currently (pg 10) not documented. This version is a bit shorter and around twice as fast in my tests on pg 9.5 and pg 10.The updated version still doesn't allow function inlining because functions declared
IMMUTABLE
may not call non-immutable functions in the body to allow that. Hardly matters for performance while we make use of an expression index on thisIMMUTABLE
function:Security for client programs has been tightened with Postgres 10.3 / 9.6.8 etc. You need to schema-qualify function and dictionary as demonstrated when used in any indexes. See:
Adapt your queries to match the index (so the query planner can use it):
You don't need the function in the right expression. You can supply unaccented strings like
'Joao'
directly.Ligatures
In Postgres 9.5 or older ligatures like 'Œ' or 'ß' have to be expanded manually (if you need that), since
unaccent()
always substitutes a single letter:You will love this update to unaccent in Postgres 9.6:
Bold emphasis mine. Now we get:
Pattern matching
For
LIKE
orILIKE
with arbitrary patterns, combine this with the modulepg_trgm
in PostgreSQL 9.1 or later. Create a trigram GIN (typically preferable) or GIST expression index. Example for GIN:Can be used for queries like:
GIN and GIST indexes are more expensive to maintain than plain btree:
There are simpler solutions for just left-anchored patterns. More about pattern matching and performance:
pg_trgm
also provides useful operators for "similarity" (%
) and "distance" (<->
).Trigram indexes also support simple regular expressions with
~
et al. and case insensitive pattern matching withILIKE
:No, PostgreSQL does not support collations in that sense
PostgreSQL does not support collations like that (accent insensitive or not) because no comparison can return equal unless things are binary-equal. This is because internally it would introduce a lot of complexities for things like a hash index. For this reason collations in their strictest sense only affect ordering and not equality.
Workarounds
Full-Text-Search Dictionary that Unaccents lexemes.
For FTS, you can define your own dictionary using
unaccent
,Which you can then index with a functional index,
You can now query it very simply
See also
Unaccent by itself.
The
unaccent
module can also be used by itself without FTS-integration, for that check out Erwin's answerI'm pretty sure PostgreSQL relies on the underlying operating system for collation. It does support creating new collations, and customizing collations. I'm not sure how much work that might be for you, though. (Could be quite a lot.)