PHP allows Unicode identifiers for variables, functions, classes and constants anyhow. It was certainly intended for localized applications. Wether it's a good idea to code an API in anything but English is debatable, but it's undisputed that some development settings could demand it.
$Schüssel = new Müsli(T_FRÜCHTE);
But PHP allows more than just \p{L}
for identifiers. You can use virtually any Unicode character, except those from the ASCII range (e.g. :
is special or \
as that's already used as internal hack to support namespaces.)
Anyway, you could do so, and I would even consider that a workable use for fun projects:
throw new ಠ_ಠ("told you about the disk space before");
But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?
For example I'm pondering this for embedding parameters into magic method names. In my case I only need to inject numeric parameters, so would get away with just the underscore:
$what->substr_0_50->ascii("text");
// (Let's skip the evilness discussion this time. Not quite sure
// yet if I really want it, but the conciseness might make sense.)
But if I wanted to embed other textual parameters, I would require another unicode character. Now that's harder to type, but if there's one that would aid readability and convey the meaning ... ?
->substr✉0✉50-> // doesn't look good
So, the question in this case: Which symbol makes sense as separator for mixed-in parameters in a virtual function name. -- Broader meta topic: Which uses of Unicode identifiers do you know about, or would you consider okayish?
Which symbol makes sense as separator for mixed-in parameters in a virtual function name.
\u2639
?
But other than localization and amusement and decorative effects, which uses of Unicode identifiers are advisable?
The biggest hurdle after font support is going to be making the character one that can be typed. Outside of a macro or copy/paste, unicode characters are not spectacularly easy to enter. Forcing this upon others is very likely going to violate the "assume the people that work with your code after you are murderous psychopaths that know where you live" rule.
We use unicode characters in only a few comments in our codebase, like
// Even though this is the end of the file and we should get an implicit exit,
// if we don't actually expressly exit here, PHP segfaults.
// ♫ Oh, PHP, I love you. ♫
I think that falls into the "amusement and decorative" category. Or the "shoot self in head after slaughtering the php-internals team" category. Pick one.
Anyway, this is not a good idea because it's going to make your code hard to modify.
Just to make it clear: PHP does not support Unicode. And it doesn't support Unicode labels. To be more precise PHP defines a LABEL
as [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
. As you can see here, it allows only a small range of characters apart from the typical alphanumeric + underscore. The fact that your Unicode labels are still accepted is only an artifact from the fact, that PHP doesn't have Unicode support. Your special characters are several bytes long in UTF-8 and PHP treats each of these bytes as a separate character and accidentally - with the characters you tried - each of them matched with the \x7f-\xff
range mentioned above.
Further reading on that topic: Exotic names for methods, constants, variables and fields - Bug or Feature?