Cyrillic encoding in C#

2019-01-15 08:09发布

I have a bunch of Cyrillic-like text in a MSSQL database and need to convert it to Cyrillic in C#.

So... Ðàáîòà â ãåðìàíèè

should become

Работа в германии

Any suggestions?

I should add that the closest I've gotten is ?aaioa a aa?iaiee

Here's the code I'm using:

 str = Encoding.UTF8.GetString(Encoding.GetEncoding("Windows-1251").GetBytes(drCurrent["myfield"].ToString()));
 str = Encoding.GetEncoding(1251).GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(1251), Encoding.UTF8.GetBytes(str)));

2条回答
男人必须洒脱
2楼-- · 2019-01-15 08:34

ADO.Net exposes all string types from SQL Server provider as C# strings, which implies that they were already converted to Unicode. For non-unicode source columns (as yours obviously is) like char(n) or varchar(n), the ADO.Net SQL Server provider uses the source collation information to determine the encoding. Therefore if your non-unicode SQL Server data gets represented in .Net with the wrong encoding, it must had been presented to the provider with the wrong collation. Choose an appropriate collation for your data and ADO.Net provider for SQL Server will translate it using the appropriate encoding. For example, as documented in Collation and Code Page Architecture, Cyrillic collations will result in code page 1251, which is very likely what you want. The articles linked contain all the information you need to fix your problem.

using System;
using System.Text;
using System.Data.SqlClient;
using System.Windows.Forms;

public class Hello1
{
   public static void Main()
   {
    try
    {
        using (SqlConnection conn = new SqlConnection("server=.;integrated security=true"))
        {
            conn.Open ();

            // The .cs file must be saved as Unicode, obviously...
            //
            string s = "Работа в германии"; 

            byte[] b = Encoding.GetEncoding(1251).GetBytes (s);

            // Create a test table
            //
            SqlCommand cmd = new SqlCommand (
                @"create table #t (
                    c1 varchar(100) collate Latin1_General_CI_AS, 
                    c2 varchar(100) collate Cyrillic_General_CI_AS)", 
                conn);
            cmd.ExecuteNonQuery ();

            // Insert the same value twice, the original Unicode string
            // encoded as CP1251
            //
            cmd = new SqlCommand (
                @"insert into #t (c1, c2) values (@b, @b)", conn);
            cmd.Parameters.AddWithValue("@b", b);
            cmd.ExecuteNonQuery ();

            // Read the value as Latin collation 
            //
            cmd = new SqlCommand (
                @"select c1 from #t", conn);
            string def = (string) cmd.ExecuteScalar ();

            // Read the same value as Cyrillic collation
            //
            cmd = new SqlCommand (
                @"select c2 from #t", conn);
            string cyr = (string) cmd.ExecuteScalar ();

            // Cannot use Console.Write since the console is not Unicode
            //
            MessageBox.Show(String.Format(
                @"Original: {0}  Default collation: {1} Cyrillic collation: {2}", 
                    s, def, cyr));
        }

    }
    catch(Exception e)
    {
        Console.WriteLine (e);
    }   
   }
}

The result is:

---------------------------

---------------------------
Original: Работа в германии  Default collation: Ðàáîòà â ãåðìàíèè Cyrillic collation: Работа в германии
---------------------------
OK   
---------------------------
查看更多
够拽才男人
3楼-- · 2019-01-15 08:35
// To find out source and target
const string source = "Ðàáîòà â ãåðìàíèè";
const string destination = "Работа в германии";

foreach (var sourceEncoding in Encoding.GetEncodings())
{

    var bytes = sourceEncoding.GetEncoding().GetBytes(source);
    foreach (var targetEncoding in Encoding.GetEncodings())
    {
        if (targetEncoding.GetEncoding().GetString(bytes) == destination)
        {
            Console.WriteLine("Source Encoding: {0} TargetEncoding: {1}",sourceEncoding.CodePage,targetEncoding.CodePage);
        }

    }
}

// Result1: Source Encoding: 1252 TargetEncoding: 1251
// Result2: Source Encoding: 28591 TargetEncoding: 1251
// Result3: Source Encoding: 28605 TargetEncoding: 1251

// The code for you to use 
var decodedCyrillic = Encoding.GetEncoding(1251).GetString(Encoding.GetEncoding(1252).GetBytes(source));
// Result: Работа в германии
查看更多
登录 后发表回答