How do I remove diacritics (accents) from a string

I'm trying to convert some strings that are in French Canadian and basically, I'd like to be able to take out the French accent marks in the letters while keeping the letter. (E.g. convert é to e, so crème brûlée would become creme brulee)

What is the best method for achieving this?

标签： .net string diacritics

17条回答

公子世无双

2楼-- · 2018-12-31 01:27

I've not used this method, but Michael Kaplan describes a method for doing so in his blog post (with a confusing title) that talks about stripping diacritics: Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others)

static string RemoveDiacritics(string text) 
{
    var normalizedString = text.Normalize(NormalizationForm.FormD);
    var stringBuilder = new StringBuilder();

    foreach (var c in normalizedString)
    {
        var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
        if (unicodeCategory != UnicodeCategory.NonSpacingMark)
        {
            stringBuilder.Append(c);
        }
    }

    return stringBuilder.ToString().Normalize(NormalizationForm.FormC);
}

Note that this is a followup to his earlier post: Stripping diacritics....

The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the "base" characters from the diacritics) and then scans the result and retains only the base characters. It's just a little complicated, but really you're looking at a complicated problem.

Of course, if you're limiting yourself to French, you could probably get away with the simple table-based approach in How to remove accents and tilde in a C++ std::string, as recommended by @David Dibben.

0人赞添加讨论(0) 举报

大哥的爱人

3楼-- · 2018-12-31 01:28

Popping this Library here if you haven't already considered it. Looks like there are a full range of unit tests with it.

https://github.com/thomasgalliker/Diacritics.NET

0人赞添加讨论(0) 举报

残风、尘缘若梦

4楼-- · 2018-12-31 01:29

This works fine in java.

It basically converts all accented characters into their deAccented counterparts followed by their combining diacritics. Now you can use a regex to strip off the diacritics.

import java.text.Normalizer;
import java.util.regex.Pattern;

public String deAccent(String str) {
    String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD); 
    Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
    return pattern.matcher(nfdNormalizedString).replaceAll("");
}

0人赞添加讨论(0) 举报

人气声优

5楼-- · 2018-12-31 01:29

you can use string extension from MMLib.Extensions nuget package:

using MMLib.RapidPrototyping.Generators;
public void ExtensionsExample()
{
  string target = "aácčeéií";
  Assert.AreEqual("aacceeii", target.RemoveDiacritics());
}

Nuget page: https://www.nuget.org/packages/MMLib.Extensions/ Codeplex project site https://mmlib.codeplex.com/

0人赞添加讨论(0) 举报

倾城一夜雪

6楼-- · 2018-12-31 01:32

In case anyone's interested, here is the java equivalent:

import java.text.Normalizer;

public class MyClass
{
    public static String removeDiacritics(String input)
    {
        String nrml = Normalizer.normalize(input, Normalizer.Form.NFD);
        StringBuilder stripped = new StringBuilder();
        for (int i=0;i<nrml.length();++i)
        {
            if (Character.getType(nrml.charAt(i)) != Character.NON_SPACING_MARK)
            {
                stripped.append(nrml.charAt(i));
            }
        }
        return stripped.toString();
    }
}

0人赞添加讨论(0) 举报

美炸的是我

7楼-- · 2018-12-31 01:33

Imports System.Text
Imports System.Globalization

 Public Function DECODE(ByVal x As String) As String
        Dim sb As New StringBuilder
        For Each c As Char In x.Normalize(NormalizationForm.FormD).Where(Function(a) CharUnicodeInfo.GetUnicodeCategory(a) <> UnicodeCategory.NonSpacingMark)  
            sb.Append(c)
        Next
        Return sb.ToString()
    End Function

0人赞添加讨论(0) 举报

1 2 3 下一页

How do I remove diacritics (accents) from a string

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间