Python's accessible class variables, sensitive

2020-07-19 03:33发布

I was trying to make a variable inaccessible for a project I'm doing, and I ran across an SO post on Does Python have “private” variables in classes?. For me, it raised some interesting questions that, to try and make this answerable, I'll label with Q1 , Q2 , etc. I've looked around, but I didn't find answers to the questions I'm asking, especially to those about sensitive data.

I found useful stuff in that post, but it seems that the general consensus was something like if you see a variable with a _ before it, act like an adult and realize you shouldn't be messing with it. The same kind of idea was put forward for variables preceded by __. There, I got the general idea that you trust people not to use tricks like those described here and (in more detail) here. I also found some good information at this SO post.

This is all very good advice when you're talking about good coding practices.

I posted some thoughts in comments to the posts I've shared. My main question was posted as a comment.

I'm surprised there hasn't been more discussion of those who want to introduce malicious code. This is a real question: Is there no way in Python to prevent a black-hat hacker from accessing your variables and methods and inserting code/data that could deny service, reveal personal (or proprietary company) informationQ1? If Python doesn't allow this type of security, should it ever be used for sensitive dataQ2?

Am I totally missing something: Could a malicious coder even access variables and methods to insert code/data that could deny service or reveal sensitive dataQ3?

I imagine I could be misunderstanding a concept, missing something, putting a problem in a place where it doesn't belong, or just being completely ignorant on what computer security is. However, I want to understand what's going on here. If I'm totally off the mark, I want an answer that tells me so, but I would also like to know how I'm totally off the mark and how to get back on it.

Another part of the question I'm asking here is from another comment I made on those posts/answers. @SLott said (somewhat paraphrased)

... I've found that private and protected are very, very important design concepts. But as a practical matter, in tens of thousands of lines of Java and Python, I've never actually used private or protected. ... Here's my question "protected [or private] from whom?"

To try and find out whether my concerns are anything to be concerned about, I commented on that post. Here it is, edited.

Q: "protected from whom?" A: "From malicious, black-hat hackers who would want to access variables and functions so as to be able to deny service, to access sensitive info, ..." It seems the A._no_touch = 5 approach would cause such a malicious coder to laugh at my "please don't touch this". My A.__get_SSN(self) seems to be just wishful hoping that B.H. (Black Hat) doesn't know the x = A(); x._A__get_SSN() trick (trick by @Zorf).

I could be putting the problem in the wrong place, and if so, I'd like someone to tell me I'm putting the problem in the wrong place, but also to explain. Are there ways of being secure with a class-based approachQ4? What other non-class-and-variable solutions are there for handling sensitive data in PythonQ5?

Here's some code that shows why I see the answers to these questions as a reason for wondering if Python should ever be used for sensitive data Q2. It's not complete code (why would I put these private values and methods down without using them anywhere?), but I hope it shows the type of thing I'm trying to ask about. I typed and ran all this at the Python interactive console.

## Type this into the interpreter to define the class.
class A():
  def __init__(self):
    self.name = "Nice guy."
    self.just_a_4 = 4
    self.my_number = 4
    self._this_needs_to_be_pi = 3.14
    self.__SSN = "I hope you do not hack this..."
    self.__bank_acct_num = 123
  def get_info():
    print("Name, SSN, bank account.")
  def change_my_number(self, another_num):
    self.my_number = another_num
  def _get_more_info(self):
    print("Address, health problems.")
  def send_private_info(self):
    print(self.name, self.__SSN, self.__bank_acct_num)
  def __give_20_bucks_to(self, ssn):
    self.__SSN += " has $20"
  def say_my_name(self):
    print("my name")
  def say_my_real_name(self):
    print(self.name)
  def __say_my_bank(self):
    print(str(self.__bank_acct_num))
>>> my_a = A()
>>> my_a._this_needs_to_be_pi
3.14
>>> my_a._this_needs_to_be_pi=4 # I just ignored begins-with-`_` 'rule'.
>>> my_a._this_needs_to_be_pi
4

## This next method could actually be setting up some kind of secure connection,  
## I guess, which could send the private data. I just print it, here.
>>> my_a.send_private_info()
Nice guy. I hope you do not hack this... 123

## Easy access and change a "private" variable
>>> my_a.__SSN
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute '__SSN'
>>> my_a.__dict__
{'name': 'Nice guy.', 'just_a_4': 4, 'my_number': 4, '_this_needs_to_be_pi': 4, 
'_A__SSN': 'I hope you do not hack this...', '_A__bank_acct_num': 123}
>>> my_a._A__SSN
'I hope you do not hack this...'

# (maybe) potentially more dangerous
>>> def give_me_your_money(self, bank_num):
      print("I don't know how to inject code, but I can")
      print("access your bank account number:")
      print(my_a._A__bank_acct_num)
      print("and use my bank account number:")
      print(bank_num)
>>> give_me_your_money(my_a,345)
I don't know how to inject code, but I can
access your bank account number:
123
and use my account number:
345

At this point, I re-entered in the class definition, which probably wasn't necessary.

>>> this_a = A()
>>> this_a.__give_20_bucks_to('unnecessary param')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute '__give_20_bucks_to'
>>> this_a._A__give_20_bucks_to('unnecessary param')
>>> this_a._A__SSN
'I hope you do not hack this... has $20'

## Adding a fake "private" variable, `this_a.__SSN`
>>> this_a.__SSN = "B.H.'s SSN"
>>> this_a.__dict__
{'name': 'Nice guy.', 'just_a_4': 4, 'my_number': 4, '_this_needs_to_be_pi': 3.14, 
'_A__SSN': 'I hope you do not hack this... has $20', '_A__bank_acct_num': 123, 
'__SSN': "B.H.'s SSN"}
>>> this_a.__SSN
"B.H.'s SSN"

## Now, changing the real one and "sending/stealing the money"
>>> this_a._A__SSN = "B.H.'s SSN"
>>> this_a._A__give_20_bucks_to('unnecessary param')
>>> this_a._A__SSN
"B.H.'s SSN has $20"

I've actually done some work at a previous contracting job with sensitive data - not SSNs and bank account numbers, but things like people's ages, addresses, phone numbers, personal history, marital and other relationship history, criminal records, etc. I wasn't involved in the programming to secure this data; I helped with trying to extract useful information by helping to ground-truth the data as preparation for machine learning. We had permission and legal go-aheads to work with such data. Another main question is this: How, in Python, could one collect, manage, analyze, and draw useful conclusions with this sensitive dataQ6? From what I've discussed here, it doesn't seem that classes (or any of the other data structures, which I didn't go into here, but which seem to have the same problems) would allow this to be done securely (privately or in a protected manner. I imagine that a class-based solution probably has something to do with compilation. Is this trueQ7?

Finally, since it wasn't security, but code reliability that brought me here, I'll post another post I found and comment I made to complete my questions.

@Marcin posted,

[In response to the OP's words,] "The problem is simple. I want private variables to be accessed and changed only inside the class." [Marcin responded] So, don't write code outside the class that accesses variables starting with __. Use pylint or the like to catch style mistakes like that.

My goal with my following reply comment was to see if my thoughts represent actual coding concerns. I hope it did't come across as rude

It seems this answer would be nice if you wrote code only for your own personal enjoyment and never had to hand it on to someone else to maintain it. Any time you're in a collaborative coding environment (any post-secondary education and/or work experience), the code will be used by many. Someone down the line will want to use an easy way to change your __you_really_should_not_touch_this variable. They may have a good reason for doing so, but it's possible you set up your code such that their "easy way" is going to break things.

Is mine a valid point, or do most coders respect the double underscoreQ8? Is there a better way, using Python, to protect the integrity of the code - better than the __ strategyQ9?

1条回答
ゆ 、 Hurt°
2楼-- · 2020-07-19 04:00

private and protected do not exist for security. They exist to enforce contracts within your code, namely logical encapsulation. If you mark a piece as protected or private, it means that it is a logical implementation detail of the implementing class, and no other code should touch it directly, since other code may not [be able to] use it correctly and may mess up state.

E.g., if your logical rule is that whenever you change self._a you must also update self._b with a certain value, then you don't want external code to modify those variables, as your internal state may get messed up if the external code does not follow this rule. You want only your one class to handle this internally since that localises the potential points of failure.

In the end all this gets compiled into a big ball of bytes anyway, and all the data is stored in memory at runtime. At that point there is no protection of individual memory offsets within the application's scope anyway, it's all just byte soup. protected and private are constraints the programmer imposes on their own code to keep their own logic straight. For this purpose, more or less informal conventions like _ are perfectly adequate.

An attacker cannot attack at the level of individual properties. The running software is a black box to them, whatever goes on internally doesn't matter. If an attacker is in a position to actually access individual memory offsets, or actually inject code, then it's pretty much game over either way. protected and private doesn't matter at that point.

查看更多
登录 后发表回答