Recently, using C#, I just declared a method parameters using the Latin character ñ
, and I tried to build (compile) my entire solution and it works, therefore I was able to execute my program. But I'm curious to know if it is wrong to use special characters such as Latin characters in a source code written in C#? If it is wrong, why?
Besides it is more legible and universal to write code in English, are there any other reason to not use special characters in a C# source code?
Let me break this down into several questions.
Is it legal according to the specification to use non-Roman letters in C# identifiers, strings, and so on?
Yes, absolutely. Any character that the Unicode specification classifies as a letter is legal. See the specification for the exact details.
Are there any technical issues regarding non-Roman letters in C# programs?
Yes, there are a few. As you are probably aware, you can both "statically" and "dynamically" link code into an application, and the compiler is an application. We've had problems in the past where the compiler had a statically-linked-in old version of the Unicode classification algorithm, and the editor had a dyamically-linked-in current version, and now the editor and the compiler can disagree on what is a legal letter, which can cause user confusion. However, the accented Latin characters you mention have been in the Unicode standard so long that they are unlikely to cause any sort of problem.
Moreover, a lot of people still use old-fashioned editors; I learned how to program at WATCOM back in the late 1980's and I still frequently use WATCOM VI as my editor. I can sometimes code faster in it than I can in Visual Studio because my fingers are just really good at it after 23 years of practice. (Though these days I use Visual Studio for almost everything.) Obviously an editor written in the 1980's is going to have a problem with Unicode.
Are there any non-technical issues regarding non-Roman letters in C# programs?
Obviously, yes. I personally would rather use Greek letters for generic type parameters, for instance:
class List<τ> : IEnumerable<τ>
or when implementing mathematical code:
degrees = 180.0 * radians / π;
But I resist the urge in deference to my coworkers who do not particularly want to be cutting and pasting, or learning arcane key combinations, just to edit my code.
Added this first bit based on the comment:
This doesn't answer the question... The OP isn't asking whether it is
allowed (obviously it is), but whether it's wrong – Thomas Levesque
Ok, let me address it more directly:
it is wrong to use special characters such as Latin characters in a
source code written in C#? If it is wrong, why?
By definition of the specification, it is not "wrong" (see below).
Besides it is more legible and universal to write code in English, are
there any other reason to not use special characters in a C# source
code?
Since you said "Besides", I'm not going to address the legibility nor "universality" topics (as is appropriate for a StackOverflow question anyways). To your other part: "are there any other reason to not use special characters"... Since I'm ignoring the first things you mentioned, I have to say I can't think of many. The only thing I can think of is; We still (amazingly) have problems with some tools supporting Unicode today (off-brand third party tools, mostly) it MAY be that you use some wacky tool which doesn't handle unicode correctly, or doesn't conform to the C# spec correctly - but I haven't come across any. So, I'd say no. (Keeping in mind you specifically said I didn't have to address to legibility or universality topics).
From the C# ECMA Specification Page 70:
The rules for identifiers given in this subclause correspond exactly to those recommended by the Unicode Standard Annex 15 except
that underscore is allowed as an initial character (as is traditional
in the C programming language), Unicode escape sequences are permitted
in identifiers, and the “@” character is allowed as a prefix to enable
keywords to be used as identifiers.
identifier::
available-identifier
@ identifier-or-keyword
available-identifier::
An identifier-or-keyword that is not a keyword
identifier-or-keyword::
identifier-start-character
identifier-part-charactersopt
identifier-start-character::
letter-character
_ (the underscore character U+005F)
identifier-part-characters::
identifier-part-character
identifier-part-characters
identifier-part-character
identifier-part-character::
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character
letter-character::
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl
The important bit there is what the spec defined a letter-character
as.
It specifically includes: A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
The character you mention (ñ unicode reference) belongs to the category "Lu" (Letter, Uppercase) which is specifically allowed by the specification in an identifier.
Playing around at home, I'll often name Func
parameters λ
because it amuses me to do so.
For code anyone would see, I wouldn't make someone have a harder job typing just because it amuses me to use a non-Latin letter in a given case. That's not the place for such amusement.
With a perfectly normal Latin letter like ñ
I'd have no qualms in using it if I had a good reason for using a loan-word that it's used in. That said, it's never come up. About the only loan-word with a diacritic I've ever used in coding is façade, but its been so long in use in a computing context and hence is so often seen in the form facade that I think of facade as a computing word derived from façade in much the same way I think of color as a computing word for colour despite the latter being the spelling used in the form of English I use, and hence would only ever use façade and colour in written English.
I personally prefer when every piece of code/comment is written in English only. And English isn't my native language. I just think it's better for communication if everybody write code using the same language.
It's extremely painful when you have to translate - from a language you don't know a single word - variable names or comments around a piece of code you're debugging.
Another point is that the language itself is written in English.
Of course it's a personal preference.
As long as it compiles I think it's OK to use, what people speaking English, calls special characters. I live in Sweden and here we have the characters ÅÄÖ which is nonexistent in English. Many people use ÅÄÖ in their programs to be able to write the program so a Swedish developer can understand. Sometimes there are words that have no good translation in English and then the Swedish word is more explanatory.