How to remove accents and all chars <> a..z in

2020-01-26 04:19发布

I need to do the following modifications to a varchar(20) field:

  1. substitute accents with normal letters (like è to e)
  2. after (1) remove all the chars not in a..z

for example

'aèàç=.32s df' 

must become

'aeacsdf'

are there special stored functions to achieve this easily?

UPDATE: please provide a T-SQL not CLR solution. This is the workaround I temporarly did because it temporarly suits my needs, anyway using a more elegant approach would be better.

CREATE FUNCTION sf_RemoveExtraChars (@NAME nvarchar(50))
RETURNS nvarchar(50)
AS
BEGIN
  declare @TempString nvarchar(100)
  set @TempString = @NAME 
  set @TempString = LOWER(@TempString)
  set @TempString =  replace(@TempString,' ', '')
  set @TempString =  replace(@TempString,'à', 'a')
  set @TempString =  replace(@TempString,'è', 'e')
  set @TempString =  replace(@TempString,'é', 'e')
  set @TempString =  replace(@TempString,'ì', 'i')
  set @TempString =  replace(@TempString,'ò', 'o')
  set @TempString =  replace(@TempString,'ù', 'u')
  set @TempString =  replace(@TempString,'ç', 'c')
  set @TempString =  replace(@TempString,'''', '')
  set @TempString =  replace(@TempString,'`', '')
  set @TempString =  replace(@TempString,'-', '')
  return @TempString
END
GO

12条回答
Ridiculous、
2楼-- · 2020-01-26 04:46

Well, this isn't a whole lot better, but it's at least a tsql set solution

declare @TempString varchar(100)

set @TempString='textàè containing éìòaccentsç''''` and things-'

select @TempString=
    replace(
        replace(
            replace(
                replace(
                    replace(
                        replace(
                            replace(
                                replace(
                                    replace(
                                        replace(
                                            replace(@TempString,' ', '') 
                                        ,'à', 'a')
                                    ,'è', 'e') 
                                ,'é', 'e')
                            ,'ì', 'i')
                        ,'ò', 'o') 
                    ,'ù', 'u') 
                ,'ç', 'c') 
            ,'''', '') 
        ,'`', '')
    ,'-', '') 



select @TempString
查看更多
Animai°情兽
3楼-- · 2020-01-26 04:47
    SELECT 'áéíóú' COLLATE Cyrillic_General_CI_AI

This will replace all accented chars...

result: aeiou

Hope this help you!

查看更多
甜甜的少女心
4楼-- · 2020-01-26 04:48

AFAIK, there isn't a direct mapping for unicode/UTF-X characters that "look similar". Unless someone has something much cooler, I'd suggest pursuing a brute-force approach so you can get your work done until then.

It sounds like you need to do 2 passes. The first pass would be to replace letters that look similar first, then go through and remove all remaining non-English letters second.

This article can help you create a user defined function so that you can use regular expressions instead of dozens of REPLACE calls: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Here is a dictionary that I've been using for this case:

    public static Dictionary<char, string> NonEnglishLetterMapping = new Dictionary<char, string>
    {
          {'a', "áàâãäåāăą"}
        //, {'b', ""}
        , {'c', "ćĉċč"}
        , {'d', "ďđ"}
        , {'e', "éëêèēĕėę"}
        //, {'f', ""}
        , {'g', "ĝğġģ"}
        , {'h', "ĥħ"}
        , {'i', "ìíîïĩīĭįı"}
        , {'j', "ĵ"}
        , {'k', "ķĸ"}
        , {'l', "ĺļľŀł"}
        //, {'m', ""}
        , {'n', "ñńņňʼnŋ"}
        , {'o', "òóôõöōŏőơ"}
        //, {'p', ""}
        //, {'q', ""}
        , {'r', "ŕŗř"}
        , {'s', "śŝşšș"}
        , {'t', "ţťŧț"}
        , {'u', "ùúûüũūŭůűųư"}
        //, {'v', ""}
        , {'w', "ŵ"}
        //, {'x', ""}
        , {'y', "ŷ"}
        , {'z', "źżž"}
    };
查看更多
贪生不怕死
5楼-- · 2020-01-26 04:48

Just two cents from me

select * From XXX  
    where cast(word as varchar(max)) collate SQL_Latin1_General_CP1253_CI_AI = 'sluiten' collate SQL_Latin1_General_CP1253_CI_AI
查看更多
Root(大扎)
6楼-- · 2020-01-26 04:49

I know it's not elegant but when collate is not working and you cannot make use of functions you can use this nested replace for most used diacritics. I post this so you don't have to type it yourself again :)

select REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(myfield,'é','e'),'ê','e'),'ë','e'),'è','e'),'É','E'),'È','E'),'Ê','E'),'Ë','E'),'ð','D'),'Ð','D'),'â','a'),'à','a'),'á','a'),'ã','a'),'æ','a'),'à','a'),'å','a'),'Å','A'),'À','A'),'Á','A'),'Â','A'),'Ã','A'),'Æ','A'),'ä','a'),'Ä','A'),'ï','i'),'î','i'),'ì','i'),'í','i'),'Ì','I'),'Í','I'),'Î','I'),'Ï','I'),'ô','o'),'ò','o'),'ó','o'),'õ','o'),'ø','o'),'Ò','O'),'Ó','O'),'Ô','O'),'Õ','O'),'Ø','O'),'ö','o'),'Ö','O'),'û','u'),'ù','u'),'ú','u'),'Ù','U'),'Ú','U'),'Û','U'),'Ü','U'),'ü','u'),'ñ','n'),'Ñ','N'),'Ç','C'),'ç','c'),'ý','y'),'ÿ','y'),'Ý','Y'),'þ','T'),'Þ','t'),'ß','ss') from mytable
查看更多
再贱就再见
7楼-- · 2020-01-26 04:55

in postgress 10:

create extension unaccent;# as root in Your db, for each db
select unaccent("ąęśłóŻŹŁÓĄĘ");

:)

查看更多
登录 后发表回答