In other languages, a general guideline that helps produce better code is always make everything as hidden as possible. If in doubt about whether a variable should be private or protected, it's better to go with private.
Does the same hold true for Python? Should I use two leading underscores on everything at first, and only make them less hidden (only one underscore) as I need them?
If the convention is to use only one underscore, I'd also like to know the rationale.
Here's a comment I left on JBernardo's answer. It explains why I asked this question and also why I'd like to know why Python is different from the other languages:
I come from languages that train you to think everything should be only as public as needed and no more. The reasoning is that this will reduce dependencies and make the code safer to alter. The Python way of doing things in reverse -- starting from public and going towards hidden -- is odd to me.
First: Why do you want to hide your data? Why is that so important?
Most of the time you don't really want to do it but you do because others are doing.
If you really really really don't want people using something, add one underscore in front of it. That's it... Pythonistas know that things with one underscore is not guaranteed to work every time and may change without you knowing.
That's the way we live and we're okay with that.
Using two underscores will make your class so bad to subclass that even you will not want to work that way.
There are already a lot of good answers to this, but I'm going to offer another one. This is also partially a response to people who keep saying that double underscore isn't private (it really is).
If you look at Java/C#, both of them have private/protected/public. All of these are compile-time constructs. They are only enforced at the time of compilation. If you were to use reflection in Java/C#, you could easily access private method.
Now every time you call a function in Python, you are inherently using reflection. These pieces of code are the same in Python.
The "dot" syntax is only syntactic sugar for the latter piece of code. Mostly because using getattr is already ugly with only one function call. It just gets worse from there.
So with that, there can't be a Java/C# version of private, as Python doesn't compile the code. Java and C# can't check if a function is private or public at runtime, as that information is gone (and it has no knowledge of where the function is being called from).
Now with that information, the name mangling of the double underscore makes the most sense for achieving "private-ness". Now when a function is called from the 'self' instance and it notices that it starts with '__', it just performs the name mangling right there. It's just more syntactic sugar. That syntactic sugar allows the equivalent of 'private' in a language that only uses reflection for data member access.
Disclaimer: I have never heard anybody from the Python development say anything like this. The real reason for the lack of "private" is cultural, but you'll also notice that most scripting/interpreted languages have no private. A strictly enforceable private is not practical at anything except for compile time.
First - What is name mangling?
Name mangling is invoked when you are in a class definition and use
__any_name
or__any_name_
, that is, two (or more) leading underscores and at most one trailing underscore.And now:
The ostensible use is to prevent subclassers from using an attribute that the class uses.
A potential value is in avoiding name collisions with subclassers who want to override behavior, so that the parent class functionality keeps working as expected. However, the example in the Python documentation is not Liskov substitutable, and no examples come to mind where I have found this useful.
The downsides are that it increases cognitive load for reading and understanding a code base, and especially so when debugging where you see the double underscore name in the source and a mangled name in the debugger.
My personal approach is to intentionally avoid it. I work on a very large code base. The rare uses of it stick out like a sore thumb and do not seem justified.
You do need to be aware of it so you know it when you see it.
PEP 8
PEP 8, the Python standard library style guide, currently says (abridged):
How does it work?
If you prepend two underscores (without ending double-underscores) in a class definition, the name will be mangled, and an underscore followed by the class name will be prepended on the object:
Note that names will only get mangled when the class definition is parsed:
Also, those new to Python sometimes have trouble understanding what's going on when they can't manually access a name they see defined in a class definition. This is not a strong reason against it, but it's something to consider if you have a learning audience.
One Underscore?
When my intention is for users to keep their hands off an attribute, I tend to only use the one underscore, but that's because in my mental model, subclassers would have access to the name (which they always have, as they can easily spot the mangled name anyways).
If I were reviewing code that uses the
__
prefix, I would ask why they're invoking name mangling, and if they couldn't do just as well with a single underscore, keeping in mind that if subclassers choose the same names for the class and class attribute there will be a name collision in spite of this.When in doubt, leave it "public" - I mean, do not add anything to obscure the name of your attribute. If you have a class with some internal value, do not bother about it. Instead of writing:
write this by default:
This is for sure a controversial way of doing things. Python newbies just hate it and even some old Python guys despise this default - but it is the default anyway, so I really recommend you to follow it, even if you feel uncomfortable.
If you really want to send the message to your users saying "Can't touch this!", the usual way is to precede the variable with one underscore. This is just a convention, but people understand it and take double care when dealing with such stuff:
This can be useful, too, for avoiding conflict between properties names and attribute names:
What about the double underscore? Well, the double underscore magic is used mainly to avoid accidental overloading of methods and name conflicts with superclasses attributes. It can be quite useful if you write a class that is expected to be extended many times.
If you want to use it for other purposes, you can, but it is neither usual nor recommended.
EDIT: Why is this so? Well, the usual Python style does not emphasize making things private - on the contrary! There is a lot of reasons for that - most of them controversial... Let us see some of them.
Python has properties
Most OO languages today use the opposite approach: what should not be used should not be visible, so attributes should be private. Theoretically, this would yield more manageable, less coupled classes, because no one would change values inside the objects recklessly.
However, this is not so simple. For example, Java classes do have a lot attributes and getters that just get the values and setters that just set the values. You need, let us say, seven lines of code to declare a single attribute - which a Python programmer would say is needless complex. Also, in practice you just write this lot of code to get one public field, since you can change its value using the getters and setters.
So why to follow this private-by-default policy? Just make your attributes public by default. Of course, it is problematic in Java, because if you decide to add some validation to your attribute, it would require you to change all
in your code to, let us say,
being
setAge()
:So in Java (and other languages) the default is to use getters and setters anyway, because they can be annoying to write but can spare you a lot of time if you find yourself in the situation I've described.
However, you do not need to do it in Python, since Python have properties. If you had this class:
and then you decide to validate ages, you do not need to change the
person.age = age
pieces of your code. Just add a property (as shown below)If you can do it and still use
person.age = age
, why would you add private fields and getters and setters?(Also, see Python is not Java and this article about the harms of using getters and setters.).
Everything is visible anyway - and trying to hide just complicates your work
Even in languages where there are private attributes, you can access them through some kind of reflection/introspection library. And people do it a lot of times, in frameworks and for solving urgent needs. The problem is that introspection libraries are just a hard way of doing what you could do with public attributes.
Since Python is a very dynamic language, it is just counterproductive to add this burden to your classes.
The problem is not to be possible to see - is to be required to see
For a Pythonista, encapsulation is not the inability of seeing internals of classes, but the possibility of avoiding to look at it. I mean, encapsulation is the property of a component which allows it to be used without the user being concerned about the internal details. If you can use a component without bothering yourself about its implementation, then it is encapsulated (in the opinion of a Python programmer).
Now, if you wrote your class in such a way you can use it without having to think about implementation details, there is no problem if you want to look inside the class for some reason. The question is: your API should be good and the rest is detail.
Guido said so
Well, this is not controversial: he said so, actually. (Look for "open kimono.")
This is culture
Yes, there are some reasons, but no killing reason. This is mostly a cultural aspect of programming in Python. Frankly, it could be the other way, too - but it is not. Also, you could just ask the other way around: why do some languages use private attributes by default? For the same main reason for the Python practice: because it is the culture of these languages, and each choice has advantages and disadvantages.
Since this culture grew up, you are well advised to follow it. Otherwise, you will get annoyed by Python programmers saying to you to remove the
__
of your code when you asked a question in Stack Overflow :)I wouldn't say that practice produces better code. Visibility modifiers only distract you from the task at hand, and as a side effect force your interface to be used as you intended. Generally speaking, enforcing visibility prevents programmers from messing things up if they haven't read the documentation properly.
A far better solution is the route that Python encourages: Your classes and variables should be well documented, and their behaviour clear. The source should be available. This is far more extensible and reliable way to write code.
My strategy in Python is this:
Above all, it should be clear what everything does. Document it if someone else will be using it. Document it if you want it to be useful in a year's time.
As a side note, you should actually be going with protected in those other languages: You never know your class might be inherited later and for what it might be used. Best to only protect those variables that you are certain cannot or should not be used by foreign code.
You shouldn't start with private data and make it public as necessary. Rather, you should start by figuring out the interface of your object. I.e. you should start by figuring out what the world sees (the public stuff) and then figure out what private stuff is necessary for that to happen.
Other language make difficult to make private that which once was public. I.e. I'll break lots of code if I make my variable private or protected. But with properties in python this isn't the case. Rather, I can maintain the same interface even with rearranging the internal data.
The difference between _ and __ is that python actually makes an attempt to enforce the latter. Of course, it doesn't try really hard but it does make it difficult. Having _ merely tells other programmers what the intention is, they are free to ignore at their peril. But ignoring that rule is sometimes helpful. Examples include debugging, temporary hacks, and working with third party code that wasn't intended to be used the way you use it.