Windows Invariant Culture Puzzle

2019-05-20 20:16发布

I have a question about the windows invariant culture.

Succinctly, my question is:

does there exist any pair of characters c1, and c2 such that:

lower(c1, invariant) =latin-general lower(c2, Invariant)

but

lower(c1, invaraint) !=invariant lower(c2, invariant)

Background:

I need to store an invariant lower case string (representing a file name) inside of SQL Server Compact, which does not support windows invariant collations.

Ideally I would like to do this without having to pull all of my comparison logic out of the database and into my app.

The idea I had for solving this was to store 2 versions of all file names: one that is used for displaying data to the customer, and another that is used for performing comparisons. The comparison column would be converted to lower case using the windows invariant locale before storing it in the database.

However, I don't really have any idea what kind of mappings the invariant culture does, other than the fact that its what windows uses for comparing file names.

I'm wondering if it is possible to get false positives (or false negatives) as a result of this scheme.

That is, can I produce characters (previously lower cased using the invariant culture) that compare equal to each other using the latin-general-1 case insensitive SQL server collation, but do not compare equal to each other under the invariant culture?

If this can happen, then my app may consider 2 files that Windows thinks are different as being the same. This could ultimately lead to data loss.

NOTE:

I am aware that it is possible to have case sensitive files on Windows. I don't need to support those scenarios, however.

4条回答
来,给爷笑一个
2楼-- · 2019-05-20 20:41

Why not URL-encode the utf8 byte representation of the filename to get an ascii version which can be converted back to unicode easily without any possible loss?

查看更多
你好瞎i
3楼-- · 2019-05-20 20:42

why don't you convert filenames to ASCII? In your situation can filenames contain non-ascii characters?

查看更多
地球回转人心会变
4楼-- · 2019-05-20 20:57

By looking through the answers to this question:

win32-file-name-comparison

which I asked a while back.,

I found an indirect link the following page:

http://msdn.microsoft.com/en-us/library/ms973919.aspx

It suggests using an ordinal comparison after an invariant upper case as the best way to mimic what the file system does.

So I think if I use as "case sensitive, accent sensitive" collation in the database, and do a "upper" using the invariant local before storing the files I should be ok.

Does anyone know if there are any problems with that?

查看更多
我只想做你的唯一
5楼-- · 2019-05-20 20:58

"However, I don't really have any idea what kind of mappings the invariant culture does, other than the fact that its what windows uses for comparing file names."

I didn't think Windows used the invariant culture when comparing file names. For example if my culture is English then I can name two separate files turkish and TURKİSH, but if someone's culture is Turkish then I hope Windows won't let them do that.

查看更多
登录 后发表回答