URL rewriting - international letters

2019-08-17 02:12发布

How should I format URLs with special/international characters?

Currently I try to make URLs "look good", so that:

www.myhost.com/this is a test, do you know how?

is converted to:

www.myhost.com/this_is_a_test_do_you_know_how

I know some international letters could be converted (ü = ue, æ = ae, å = aa), some characters could be removed. I general I try to make the URL look "good", but is that stupid?

But what do I do with chinese, japanese, arabian letters that has nothing to do with our western ASCII format?

I really don't like the idea of rewriting the URL with hex codes, so right now I just use my internal unique ID if the url contains too many "non convertable" characters.

4条回答
祖国的老花朵
2楼-- · 2019-08-17 02:44

if you're using .NET with not

Server.URLEncode( myURL );

but if you want to use the scandinavian chars or whatever char you want, you just need to set up the rule in your URL ReWriting component because DynamicWeb CMS software uses the all chars available, only replace spaces by underscores ('_')

like this url:

http://www.gynækologen.dk/Undersøgelser_og_behandlinger.aspx

you can see the æ in the domain as well the ø in the page name

查看更多
Evening l夕情丶
3楼-- · 2019-08-17 02:48

But doesn't Google take advantage of the URL? If some of the text from a given article is in the URL Google search engine will use that? But if there really is no cool way of handling the non-ascii letters, then those languages is lower prioritized on the "google-internet?"

查看更多
闹够了就滚
4楼-- · 2019-08-17 02:54

Have a look at say, http://ja.wikipedia.org/ . If you mouseover the links, they show up in the status bar as Japanese characters. Doesn't look so Japanese in the location bar when you follow the link, but that possibly can't be helped. Haven't checked, but I assume it's all utf8 hex-encoded.

查看更多
看我几分像从前
5楼-- · 2019-08-17 03:00

What language are you using? PHP includes a function filter_var() that seems to do most of what you want. See http://us.php.net/manual/en/function.filter-var.php.

In general, the cost of making human-readable ASCII strings from arbitrary string input is probably too great to be worth it. If the user gives you a Chinese hanzi, what are you going to do? Look it up in a dictionary and output the result in pinyin?

The best, most general solution is simply to take the input, format it as UTF-8, then url-encode the result. This will make non-Latin text unreadable, but there is no good, general solution for those languages anyway. The language you're using almost certainly has library functions that can make this easy.

查看更多
登录 后发表回答