I have a script which creates users in Microsoft Exchange Server and Active Directory. So, though it's commmon that user's names have accents or ñ in Spain, I want to avoid them for the username to not to cause any incompatibilities in old systems.
So, how could I clean a string like this?
$name = "Ramón"
To be like that? :
$name = "Ramon"
Well I can help you with some of the code.....
I used this recently in a c# project to strip from email addresses:
static string RemoveDiacritics(string stIn)
{
string stFormD = (stIn ?? string.Empty).Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
I guess I can now say 'extending into a PowerShell script/form is left to the reader'.... hope it helps....
As per ip.'s answer, here is the Powershell version.
function Remove-Diacritics {
param ([String]$src = [String]::Empty)
$normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
$sb = new-object Text.StringBuilder
$normalized.ToCharArray() | % {
if( [Globalization.CharUnicodeInfo]::GetUnicodeCategory($_) -ne [Globalization.UnicodeCategory]::NonSpacingMark) {
[void]$sb.Append($_)
}
}
$sb.ToString()
}
# Test data
@("Rhône", "Basíl", "Åbo", "", "Gräsäntörmä") | % { Remove-Diacritics $_ }
Output:
Rhone
Basil
Abo
Grasantorma
Another PowerShell translation of @ip for non C# coders ;o)
function Remove-Diacritics
{
param ([String]$sToModify = [String]::Empty)
foreach ($s in $sToModify) # Param may be a string or a list of strings
{
if ($sToModify -eq $null) {return [string]::Empty}
$sNormalized = $sToModify.Normalize("FormD")
foreach ($c in [Char[]]$sNormalized)
{
$uCategory = [System.Globalization.CharUnicodeInfo]::GetUnicodeCategory($c)
if ($uCategory -ne "NonSpacingMark") {$res += $c}
}
return $res
}
}
Clear-Host
$name = "Un été de Raphaël"
Write-Host (Remove-Diacritics $name )
$test = ("äâûê", "éèà", "ùçä")
$test | % {Remove-Diacritics $_}
Remove-Diacritics $test
PS> [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding(1251).GetBytes("Ramón"))
Ramon
PS>
Another solution... quickly "reuse" your C# in PowerShell (C# code credits lost somewhere on the net).
Add-Type -TypeDefinition @"
using System.Text;
using System.Globalization;
public class Utils
{
public static string RemoveDiacritics(string stIn)
{
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString().Normalize(NormalizationForm.FormC));
}
}
"@ | Out-Null
[Utils]::RemoveDiacritics("ABC-abc-ČŠŽ-čšž")
Instead of creating a stringbuilder and looping over characters, you can just use -replace on the NFD string to remove combining marks:
function Remove-Diacritics {
param ([String]$src = [String]::Empty)
$normalized = $src.Normalize( [Text.NormalizationForm]::FormD )
($normalized -replace '\p{M}', '')
}
With the help of the above examples I use this "one-liner:" in pipe (tested only in Win10):
"öüóőúéáűí".Normalize("FormD") -replace '\p{M}', ''
Result:
ouooueeui