Special Characters & URL Rewriting

2019-06-17 03:51发布

问题:

I am currently working on an application that pulls JSON data from the blizzard community API and parses it with PHP. Everything works okay until I come to a character with a special character in their name.

In order to pull the characters data I need to know their characters name and realm that they are on.

I have the name and realm being passed through the URL to the character page, and from there using that information to pull the character data.

At this point my URLs are like so:

 http://localhost/guildtree/characters.php?realm=argent-dawn&name=Ankzu

At this point if I try to pull data for a character with an accent I get re-directed to my error page because it is not a valid character.

It wasn't until I started the URL rewriting that I discovered my problem. I am being re-directed to my error page because somewhere along the line the special characters are being substituted for some really wonky characters.

With my new rewritten URLs the follow works:

 http://localhost/guildtree/argent-dawn/ankzu

However, a character with a special character in their name results in an error message.

 http://localhost/guildtree/argent-dawn/notúk

Results in the following error message:

"Not Found

The requested URL /guildtree/argent-dawn/notúk was not found on this server."

As you can see the ú is being substituted for ú, but when I copy and paste the URL the ú appears as %C3%BA

It is my understanding that the reason why the ú is appearing as ú is because the two byte unicode ú is being squished into two one byte ASCII characters resulting in the ú being displayed.

I have ensured that all my pages have the following in the header:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

In order for my application to work properly I need those special characters to display properly, so I need the ú to actually display as ú, not appear as ú but actually be ú or %C3%BA.

The characters name is being pulled from the URL simply as:

$charName = $_GET['name'];

Is it possible to encode $charName to display the special characters properly?

I have tried everything I can think of and have searched on Google but nothing has worked.

Also, because I am using URL rewriting what would the rewrite rule be to allow for these special characters?

Here is my current rewrite rule:

 RewriteRule ^([a-zA-Z0-9_'-]+)/([a-zA-Z]+)$        characters.php?realm=$1&name=$2     [NC]

I'm aware that ([a-zA-Z]+) does not allow at all for special characters, I have currently been working on getting the special characters to display properly. If I use ([a-zA-Z\ú]+) it will work and display the page as it needs to be displayed. Adding the \ú to the rule seems like a very poor way to do this and does not always work while using the corresponding character for the accented characters.

Any help would be greatly appreciated. If you require any more information please ask.

Edit:

Changing my rewrite rule to the below allows for the information to be pulled fine, but creates a redirect loop for my CSS.

 RewriteRule ^([a-zA-Z0-9_'-]+)/([^/]+)$        characters.php?realm=$1&name=$2 [NC]

For example my CSS is being redirected to

http://localhost/guildtree/css/error

instead of

http://localhost/guildtree/css/style2.css

Update:

Through a few simple tests:

$charName = $_GET['name'];
$charNameTEST = utf8_encode($charName);

Will make the change, but when I apply this to my page it still comes up saying:

"Not Found

The requested URL /guildtree/argent-dawn/notúk was not found on this server."

I think the main issue now is with the URL redirecting, because the JSON data can be parsed perfectly fine when it has the accented characters. I just don't understand why it keeps showing me that it is on guildtree/argent-dawn/notúk in the browser bar, but keeps trying to pull up /guildtree/argent-dawn/notúk.

回答1:

ú is not a valid character, for a URL.

Wherever you link the username, you should URL-encode it.

Hence the correct URL to point to is:

http://localhost/guildtree/argent-dawn/not%C3%BAk

You should print it in php as:

echo '<a href="http://localhost/guildtree/argent-dawn/'. urlencode($name) .'">Link</a>;


回答2:

I think this question might have your answer. I have not tried this myself, but from what I can see, you'd need to rewrite your RewriteRule as:

RewriteRule ^([a-zA-Z0-9_'-]+)/([a-zA-Z]+)$        characters.php?realm=$1&name=$2     [NC,B]

The B flag will ensure that the special characters are URL escaped, so the value seen by name in $2 would be percent encoded. Since you aren't doing a redirect, the original unicode character should still be what's displayed in the URL.

You'll also need some changes to the regex to ensure it matches unicode characters. I'm not sure what those would be.

There is also some more description of how unicode characters work in URLs over here.



回答3:

To get this to work properly you need to do two things.

Firstly add this to your .htaccess

AddDefaultCharset On
AddDefaultCharset UTF-8
AddCharset UTF-8 .tpl
AddCharset UTF-8 .js
AddCharset UTF-8 .css
AddCharset UTF-8 .php

Secondly change the part of your rewrite rule that needs to allow the special characters to (.*) like so:

 RewriteRule ^([a-zA-Z0-9_'-]+)/(.*)$       characters.php?realm=$1&name=$2     [NC]

This will cause some redirect loops for other pages, but I am working on fixing that at the moment.