Why are private fields private to the type, not th

2020-01-24 03:58发布

问题:

In C# (and many other languages) it's perfectly legitimate to access private fields of other instances of the same type. For example:

public class Foo
{
    private bool aBool;

    public void DoBar(Foo anotherFoo)
    {
        if (anotherFoo.aBool) ...
    }
}

As the C# specification (sections 3.5.1, 3.5.2) states access to private fields is on a type, not an instance. I've been discussing this with a colleague and we're trying to come up with a reason why it works like this (rather than restricting access to the same instance).

The best argument we could come up with is for equality checks where the class may want to access private fields to determine equality with another instance. Are there any other reasons? Or some golden reason that absolutely means it must work like this or something would be completely impossible?

回答1:

I think one reason it works this way is because access modifiers work at compile time. As such, determining whether or not a given object is also the current object isn't easy to do. For example, consider this code:

public class Foo
{
    private int bar;

    public void Baz(Foo other)
    {
        other.bar = 2;
    }

    public void Boo()
    {
        Baz(this);
    }
}

Can the compiler necessarily figure out that other is actually this? Not in all cases. One could argue that this just shouldn't compile then, but that means we have a code path where a private instance member of the correct instance isn't accessible, which I think is even worse.

Only requiring type-level rather than object-level visibility ensures that the problem is tractable, as well as making a situation that seems like it should work actually work.

EDIT: Danilel Hilgarth's point that this reasoning is backwards does have merit. Language designers can create the language they want, and compiler writers must conform to it. That being said, language designers do have some incentive to make it easier for compiler writers to do their job. (Though in this case, it's easy enough to argue that private members could then only be accessed via this (either implicitly or explicitly)).

However, I believe that makes the issue more confusing than it needs to be. Most users (myself included) would find it unneccessarily limiting if the above code didn't work: after all, that's my data I'm trying to access! Why should I have to go through this?

In short, I think I may have overstated the case for it being "difficult" for the compiler. What I really meant to get across is that above situation seems like one that the designers would like to have work.



回答2:

Because the purpose of the kind of encapsulation used in C# and similar languages* is to lower mutual dependence of different pieces of code (classes in C# and Java), not different objects in memory.

For example, if you write code in one class that uses some fields in another class, then these classes are very tightly coupled. However, if you are dealing with code in which you have two objects of the same class, then there is no extra dependency. A class always depends on itself.

However, all this theory about encapsulation fails as soon as someone creates properties (or get/set pairs in Java) and exposes all the fields directly, which makes classes as coupled as if they were accessing fields anyway.

*For clarification on kinds of encapsulation see Abel's excellent answer.



回答3:

Quite some answers have already been added to this interesting thread, however, I didn't quite find the real reason for why this behavior is the way it is. Let me give it a try:

Back in the days

Somewhere between Smalltalk in the 80's and Java in the mid 90's the concept of object-orientation matured. Information hiding, not originally thought of as a concept only available to OO (mentioned first in 1978), was introduced in Smalltalk as all data (fields) of a class is private, all methods are public. During the many new developments of OO in the 90's, Bertrand Meyer tried to formalize much of the OO concepts in his landmark book Object Oriented Software Construction (OOSC) which has since then be considered an (almost) definitive reference on OO concepts and language design.

In the case of private visibility

According to Meyer a method should be made available to a defined set of classes (page 192-193). This gives obviously a very high granularity of information hiding, the following feature is available to classA and classB and all their descendants:

feature {classA, classB}
   methodName

In the case of private he says the following: without explicitly declaring a type as visible to its own class, you cannot access that feature (method/field) in a qualified call. I.e. if x is a variable, x.doSomething() is not allowed. Unqualified access is allowed, of course, inside the class itself.

In other words: to allow access by an instance of the same class, you have to allow the method access by that class explicitly. This is sometimes called instance-private versus class-private.

Instance-private in programming languages

I know of at least two languages currently in use that use instance-private information hiding as opposed to class-private information hiding. One is Eiffel, a language designed by Meyer, that takes OO to its utmost extremes. The other being Ruby, a far more common language nowadays. In Ruby, private means: "private to this instance".

Choices for language design

It has been suggested that allowing instance-private would be hard for the compiler. I don't think so, as it is relatively simple to just allow or disallow qualified calls to methods. If for a private method, doSomething() is allowed and x.doSomething() is not, a language designer has effectively defined instance-only accessibility for private methods and fields.

From a technical point of view, there's no reason to choose one way or the other (esp. when considering that Eiffel.NET can do this with IL, even with multiple inheritance, there's no inherent reason not to provide this feature).

Of course, it's a matter of taste and as others already mentioned, quite some methods might be harder to write without the feature of class-level visibility of private methods and fields.

Why C# allows only class encapsulation and not instance encapsulation

If you look at internet threads on instance encapsulation (a term sometimes used to refer to the fact that a language defines the access modifiers on instance level, as opposed to class level), the concept is often frowned upon. However, considering that some modern languages use instance encapsulation, at least for the private access modifier, makes you think it can be and is of use in the modern programming world.

However, C# has admittedly looked hardest at C++ and Java for its language design. While Eiffel and Modula-3 were also in the picture, considering the many features of Eiffel missing (multiple inheritance) I believe they chose the same route as Java and C++ when it came to the private access modifier.

If you really want to know the why you should try to get a hold of Eric Lippert, Krzysztof Cwalina, Anders Hejlsberg or anyone else who worked on the standard of C#. Unfortunately, I couldn't find a definitive note in the annotated The C# Programming Language.



回答4:

This is only my opinion, but pragmatically, I think that if a programmer has access to the source of a class, you can reasonably trust them with accessing the class instance's private members. Why bind a programmers right hand when in their left you've already given them the keys to the kingdom?



回答5:

The reason indeed is equality check, comparison, cloning, operator overloading... It would be very tricky to implement operator+ on complex numbers, for example.



回答6:

First of all, what would happen to private static members? Can they only be accessed by static methods? You certainly wouldn't want that, because then you wouldn't be able to access your consts.

As to your explicit question, consider the case of a StringBuilder, which is implemented as a linked list of instances of itself:

public class StringBuilder
{
    private string chunk;
    private StringBuilder nextChunk;
}

If you can't access the private members of other instances of your own class, you have to implement ToString like this:

public override string ToString()
{
    return chunk + nextChunk.ToString();
}

This will work, but it's O(n^2) -- not very efficient. In fact, that probably defeats the whole purpose of having a StringBuilder class in the first place. If you can access the private members of other instances of your own class, you can implement ToString by creating a string of the proper length and then doing an unsafe copy of each chunk to its appropriate place in the string:

public override string ToString()
{
    string ret = string.FastAllocateString(Length);
    StringBuilder next = this;

    unsafe
    {
        fixed (char *dest = ret)
            while (next != null)
            {
                fixed (char *src = next.chunk)
                    string.wstrcpy(dest, src, next.chunk.Length);
                next = next.nextChunk;
            }
    }
    return ret;
}

This implementation is O(n), which makes it very fast, and is only possible if you have access to private members of other instances of your class.



回答7:

This is perfectly legitimate in many languages (C++ for one). The access modifiers come from the encapsulation principle in OOP. The idea is to restrict access to outside, in this case outside is other classes. Any nested class in C# for example can access it's parents private members also.

While this is a design choice for a language designer. The restriction of this access can complicate some very common scenarios extremely without contributing much to isolation of entities.

There is a similar discussion here



回答8:

I don't think that there's a reason we couldn't add another level of privacy, where data is private to each instance. In fact, that might even provide a nice feeling of completeness to the language.

But in actual practice, I doubt it would really be that useful. As you pointed out, our usual private-ness it is useful for things such as equality checks, as well as most other operations involving multiple instances of a Type. Although, I also like your point about maintaining data abstraction, as that is an important point in OOP.

I think all around, providing the ability to restrict access in such a way could be a nice feature to add to OOP. Is it really that useful? I'd say no, since a class should be able to trust its own code. Since that class is the only things that can access private members, there's no real reason to need data abstraction when dealing with an instance of another class.

Of course, you can always write your code as if private applied to instances. Use the usual get/set methods to access/change the data. That would probably make the code more manageable if the class may be subject to internal changes.



回答9:

Great answers given above. I would add that part of this problem is the fact that instantiating a class inside itself is even allowed in the first place. It makes since in a recursive logic "for" loop, for example, to use that type of trick as long as you have logic to end the recursion. But instantiating or passing the same class inside itself without creating such loops logically creates its own dangers, even though its a widely accepted programming paradigm. For example, a C# Class can instantiate a copy of itself in its default constructor, but that doesn't break any rules or create causal loops. Why?

BTW....this same problem applies to "protected" members as well. :(

I never accepted that programming paradigm fully because it still comes with a whole suite of issues and risks that most programmers don't fully grasp until issues like this problem rise up and confuse people and defy the whole reason for having private members.

This "weird and wacky" aspect of C# is yet one more reason why good programming has nothing to do with experience and skill, but just knowing the tricks and traps.....like working on a car. Its the argument that rules were meant to be broken, which is a very bad model for any computing language.



回答10:

It seems to me that if data was private to other instances of the same type, it wouldn't necessarily be the same type anymore. It wouldn't seem to behave or act the same as other instances. Behavior could easily be modified based on that private internal data. That would just spread confusion in my opinion.

Loosely speaking, I personally think that writing classes derived from a base class offers similar functionality you are describing with 'having private data per instance'. Instead, you just have a new class definition per 'unique' type.