0x202A in filename: Why?

2019-07-19 12:08发布

I recently needed to do a isnull in SQL on a varbinary image.
So far so (ab)normal. I very quickly wrote a C# program to read in the file no_image.png from my desktop, and output the bytes as hex string.

That program started like this:

byte[] ba = System.IO.File.ReadAllBytes(@"‪D:\UserName\Desktop\no_image.png");
Console.WriteLine(ba.Length);
// From here, change ba to hex string

And as I had used readallbytes countless times before, I figured no big deal.
To my surprise, I got a "NotSupported" exception on ReadAllBytes.

I found that the problem was that when I right click on the file, go to tab "Security", and copy-paste the object-name (start marking at the right and move inaccurately to the left), this happens.

And it happens only on Windows 8.1 (and perhaps 8), but not on Windows 7.

202A

When I output the string in question:

public static string ToHexString(string input)
{
    string strRetVal = null;
    System.Text.StringBuilder sb = new System.Text.StringBuilder();

    foreach (char c in input)
    {
        sb.Append(((int)c).ToString("X2"));
    }

    strRetVal = sb.ToString();
    sb.Length = 0;
    sb = null;

    return strRetVal;
} // End Function ToHexString

string str = ToHexString(@"‪D:\UserName\Desktop\cookie.png");
string strRight = " (" + ToHexString(@"D:\UserName\Desktop\cookie.png") + ")"; // Correct value, for comparison

string msg = str + Environment.NewLine + "  " + strRight;
Console.WriteLine(msg);

I get this:

202A443A5C557365724E616D655C4465736B746F705C636F6F6B69652E706E67
   (443A5C557365724E616D655C4465736B746F705C636F6F6B69652E706E67)

First thing, when I lookup 20 2A in ascii, it's [space] + *

Since I don't see neither a space nor a star, when I google 20 2A, the first thing I get is paragraph 202a of the german penal code http://dejure.org/gesetze/StGB/202a.html

But I suppose that is rather an unfortunate coincidence and it is actually the unicode control character 'LEFT-TO-RIGHT EMBEDDING' (U+202A) http://www.fileformat.info/info/unicode/char/202a/index.htm

Is that a bug, or is that a feature ?
My guess is, it's a buggy feature.

3条回答
孤傲高冷的网名
2楼-- · 2019-07-19 12:11

The issue is that the string does not begin with a letter D at all - it just looks like it does.

It appears that the string is hard-coded in your source file.

If that's the case, then you have pasted the string from the security dialog. Unbeknownst to you, the string you pasted begins with the LRO character. This is an invisible character which tales no space, but tells the renderer to render characters from left-to-right, ignoring the usual rendering.

You just need to delete the character.

To do this, position the cursor AFTER the D in the string. Use the Backspace or Delete to Left key <x] to delete the D. Use the key again to delete the invisible LRO character. One more time to delete the ". Now retype the " and the D.

A similar problem could occur wherever the string came from - e.g. from user input, command line, script file etc.

Note: The security dialog shows the filename beginning with the LRO character to ensure that characters are displayed in the left-to-right order, which is necessary to ensure that the hierarchy is correctly understood when using RTL characters. e.g. a filename c:\folder\path\to\file in Arabic might be c:\folder\مسار/إلى/ملف. The "gotcha" is the Arabic parts read in the other direction so the word "path" according to google translate is مسار, and that is the rightmost word, making it appear is if it was the last element of the path, when in fact it is the element immediately after "c:\folder\".

Because security object paths have an hierarchy which is in conflict with the RTL text layout rules, the security dialog always displays RTL text in LTR mode. That means that the Arabic words will be mangled (letters in wrong order) on the security tab. (Imagine it as if it said "elif ot htap"). So the meaning is just about discernable, but from the point of view of security, the security semantics are preserved.

查看更多
放我归山
3楼-- · 2019-07-19 12:31

Filenames that contain RLO/LRO overrides are commonly created by malware. Eg. “exe” read backwards spells “malware”. You probably have an infected host, or the origin of the .png is infected.

查看更多
啃猪蹄的小仙女
4楼-- · 2019-07-19 12:35

This question bothered me a lot, how would it be possible that a deterministic function would give 2 different results for identical input? After some testing, it turns out that the answer is simple.

If you look through it in your debugger, you will see that the 'D' char in your @"‪D:\UserName\Desktop\cookie.png" (first use of Hex function) is NOT the same char as in @"D:\UserName\Desktop\cookie.png" (second use).

You must have used some other 'D'-like character, probably by unwanted keyboard shortcut or by messing with your Visual Studio character encoding.

It looks exactly the same, but in reality it's not event a single char 9try to watch the c variable in your toHex function.

if you change to the normal 'D' in your first example, it will work fine.

查看更多
登录 后发表回答