I need to do the following modifications to a varchar(20) field:
- substitute accents with normal letters (like è to e)
- after (1) remove all the chars not in a..z
for example
'aèàç=.32s df'
must become
'aeacsdf'
are there special stored functions to achieve this easily?
UPDATE: please provide a T-SQL not CLR solution. This is the workaround I temporarly did because it temporarly suits my needs, anyway using a more elegant approach would be better.
CREATE FUNCTION sf_RemoveExtraChars (@NAME nvarchar(50))
RETURNS nvarchar(50)
AS
BEGIN
declare @TempString nvarchar(100)
set @TempString = @NAME
set @TempString = LOWER(@TempString)
set @TempString = replace(@TempString,' ', '')
set @TempString = replace(@TempString,'à', 'a')
set @TempString = replace(@TempString,'è', 'e')
set @TempString = replace(@TempString,'é', 'e')
set @TempString = replace(@TempString,'ì', 'i')
set @TempString = replace(@TempString,'ò', 'o')
set @TempString = replace(@TempString,'ù', 'u')
set @TempString = replace(@TempString,'ç', 'c')
set @TempString = replace(@TempString,'''', '')
set @TempString = replace(@TempString,'`', '')
set @TempString = replace(@TempString,'-', '')
return @TempString
END
GO
Well, this isn't a whole lot better, but it's at least a tsql set solution
This will replace all accented chars...
result: aeiou
Hope this help you!
AFAIK, there isn't a direct mapping for unicode/UTF-X characters that "look similar". Unless someone has something much cooler, I'd suggest pursuing a brute-force approach so you can get your work done until then.
It sounds like you need to do 2 passes. The first pass would be to replace letters that look similar first, then go through and remove all remaining non-English letters second.
This article can help you create a user defined function so that you can use regular expressions instead of dozens of REPLACE calls: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Here is a dictionary that I've been using for this case:
Just two cents from me
I know it's not elegant but when collate is not working and you cannot make use of functions you can use this nested replace for most used diacritics. I post this so you don't have to type it yourself again :)
in postgress 10:
:)