I would like to make a search on a collection in my mongodb database. In my collection, I have documents with the field "name" can be values like:
[i] "Palácio Guanabara", "Palácio da Cidade", "Festa Palácio", etc.
When a user types a search like "pala" or "palá" or "Pala" or "PalÁ", all those itens in [i] must build the result set.
I found that in MongoDB I could use regex in searches, like:
{ "name": { $regex: new Regex(".*pala.*", "i") } }
Ok, this approach is case insensitive and use the percent like logic from SQL ("%pala%"). But, it isn't ignore accents from the register in database.
I found another alternative with the $text index: https://docs.mongodb.org/manual/core/index-text/
This approach can ignore case sensitive and accents. But the "search" does not accepts a regex, so I can't search things like "%pala%".
Summing up, I want to make the following SQL query in MongoDB:
select * from collection where remove_accents(upper(name)) like '%Pala%'
And this query returning results with name like "palácio", "palacio", "PaláCiô", etc.
There is no magic bullet here inside of MongoDb.
But since you obviously changing user input anyway to create '%pala%', why not replace "a" with "[aá]" and wrap in ".*", this way you can use regex and have your diacritics.
Here are options not much work to create replaces.
French Letters
[a-zA-ZàâäôéèëêïîçùûüÿæœÀÂÄÔÉÈËÊÏΟÇÙÛÜÆŒ]
German Letters
The controversial capital letter for ß, now included in unicode, is missing in many fonts, so it might show on your screen as a question mark.
[a-zA-ZäöüßÄÖÜẞ]
Polish Letters
[a-pr-uwy-zA-PR-UWY-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ]
Note that there is no Q, V and X in Polish. But if you want to allow all English letters as well, use [a-zA-ZąćęłńóśźżĄĆĘŁŃÓŚŹŻ]
Italian Letters
[a-zA-ZàèéìíîòóùúÀÈÉÌÍÎÒÓÙÚ]
Spanish Letters
[a-zA-ZáéíñóúüÁÉÍÑÓÚÜ]
from
http://www.rexegg.com/regex-interesting-character-classes.html#languages
what happened if you use just :
find({name: {$regex: 'pala', $options: "i"}})
you used new Regex()
that may not valid constructor valid constructor is new RegExp()
This works for me!
Just replace vowels with a regex which contains accents vowels. Additionally can use $options: 'i'
to avoid case sensitive.
function diacriticSensitiveRegex(string = '') {
return string.replace(/a/g, '[a,á,à,ä]')
.replace(/e/g, '[e,é,ë]')
.replace(/i/g, '[i,í,ï]')
.replace(/o/g, '[o,ó,ö,ò]')
.replace(/u/g, '[u,ü,ú,ù]');
}
find ({ name: { $regex: diacriticSensitiveRegex('pala'), $options: 'i' } });