Encoding of rtf file

2020-02-15 09:18发布

问题:

I get a base64 encoded string which represents a rtf-file.

If I look the original text representation (before base64 encode) I see the character sequence F¸r. This should stand for Für, when displayed in a viewer. The header of the rtf-file contains ansicpg1252 so this should be the encoding except otherwise changed (escape sequences, font definitions, ..).

My problem now is that I can't correctly decode the base 64 string to its original representation. I never get F¸r anymore. Instead I have Für or even F\'fcr. Through this the representation of the umlaut is wrong when displaying the decoded rtf in a viewer.

So what is the original encoding of the rtf-file? Or what is going wrong here?

You can have a look into a sample file here. This is the base 64 encoded string I get.

Edit:

I don't have the code for the encoding, but I think I can reconstruct that. This is my code for this:

string path = "/some/path/ltxt1 Kopie.rtf";
byte[] document = File.ReadAllBytes(path);
string base64string = Convert.ToBase64String(document);
var isoBytes = Convert.FromBase64String(base64string);

File.WriteAllText ("/some/path/sketch.rtf", System.Text.Encoding.GetEncoding("iso-8859-1").GetString(isoBytes));

I tried to change the encoding, but with windows-1252 I get an error (sketch: encoding name not supported, real project: array not null).

回答1:

Your issue is not the encoding of the file. If you run your code and compare the results, the text is the same in each.

Your issue is that the source file is ANSI encoded and your second file is UTF-8 encoded. However, the RTF directive in the text tells whatever is interpreting the RTF that is it ANSI encoded (the ansicpg1252 part). So it then makes a total mess of decoding it due to the mismatch.

The simplest way around this is to make sure you write it back to disc using the matching encoding:

var iso = Encoding.GetEncoding("ISO-8859-1");
File.WriteAllText("/some/path/sketch.rtf", iso.GetString(isoBytes), iso);

Or, more simply:

File.WriteAllBytes("/some/path/sketch.rtf", isoBytes);