I'm writing a simple full text search library, and need case folding to check if two words are equal. For this use case, the existing .to_lowercase()
and .to_uppercase()
methods are not enough.
From a quick search of crates.io, I can find libraries for normalization and word splitting but not case folding. regex-syntax
does have case folding code, but it's not exposed in its API.
If there aren't any existing solutions then I might have to roll my own
The unicase crate doesn't expose case folding directly, but it provides a generic wrapper type that implements
Eq
,Ord
andHash
in a case insensitive manner. The master branch (unreleased) supports both ASCII case folding (as an optimization) and Unicode case folding (though only invariant case folding is supported).For my use case, I've found the caseless crate to be most useful.
As far as I know, this is the only library which supports normalization. This is important when you want e.g. "㎒" (U+3392 SQUARE MHZ) and "mhz" to match. See Chapter 3 - Default Caseless Matching in the Unicode Standard for details on how this works.
Here's some example code that matches a string case-insensitively:
To get the case folded string directly, you can use the
default_case_fold_str
function:Caseless doesn't expose a corresponding function that normalizes as well, but you can write one using the unicode-normalization crate:
Note that multiple rounds of normalization and case folding are needed for a correct result.
(Thanks to BurntSushi5 for pointing me to this library.)