Translate url to a valid file name and back to url

2019-06-24 15:12发布

I need to store some information that is unique for each site that my users accesses. (It is actually a thumbnail of the site that he has looked at.)
This thumbnail (jpeg file) needs to have a name indicating which site it represents so that it can be viewed later on.

Can you recommend a simple translation from url to a valid file name and back?

Example: www.ibm.com could be mapped to www_ibm_com.

I am not sure that this will always work with all valid urls in some cases urls have very complex query strings.

Is there a good regex or c# library that can be used?

Thanks in advance and be happy.

标签: c# url filenames
2条回答
2楼-- · 2019-06-24 15:52

www.ibm.com is actually a valid filename. More problematic are slashes. So if the URL contains subdirectories, you'll need to translate the slashes.

The main problem then is possible duplicates. For example, both ibm.com/path1_path2 and ibm.com/path1/path2 would translate to the same value.

I like ChrisF's suggestion of find a character that is legal in filenames but not in URLs, although I don't even know which character, if any, that would be off the top of my head.

If you don't find such a character, then you may need to stick with an unlikely character instead.

查看更多
做个烂人
3楼-- · 2019-06-24 15:58

Firstly it's worth pointing out that "." is perfectly legal in file names, but "/" isn't, so while the example you quote doesn't need translating, "www.ibm.com/path1/file1.jpg" would.

A simple string.Replace would be the best solution here - assuming you can find a character that's legal in a file name but illegal in a url.

Assuming that the illegal URL character is "§" (which may be legal in a URL), then you've got:

string.Replace("/", "§");

to translate to a file name and:

string.Replace("§", "/");

to translate back.

This page on URL Encoding defines what are valid, invalid and unsafe (valid but with special meaning) characters for URLS. Characters in the "top half" of the ISO-Latin set 80-FF hex (128-255 decimal.) are not legal but might be OK in file names.

You will need to do this for each character in the URL that is in the set of invalid file name characters. You can get this using GetInvalidFileNameChars.

UPDATE

Assuming that you can't find suitable character pairs, then another solution would be to use a lookup table. One column holds the URL the other the generated filename. As long as the generated name is unique (a GUID would do), you can do a two way lookup to get from one to the other.

查看更多
登录 后发表回答