dictionary shared between objects for no reason?

2020-04-12 13:02发布

问题:

The following code is supposed to create a new (modified) version of a frequency distribution (nltk.FreqDist). Both variables should then be the same length.

It works fine when a single instance of WebText is created. But when multiple WebText instances are created, then the new variable seems to be shared by all the objects.

For example:

import nltk
from operator import itemgetter

class WebText:

    freq_dist_weighted = {}

    def __init__(self, text):
        tokens = nltk.wordpunct_tokenize(text) #tokenize
        word_count = len(tokens)
        freq_dist = nltk.FreqDist(tokens)


        for word,frequency in freq_dist.iteritems():
            self.freq_dist_weighted[word] = frequency/word_count*frequency
        print len(freq_dist), len(self.freq_dist_weighted)

text1 = WebText("this is a test")
text2 = WebText("this is another test")
text3 = WebText("a final sentence")

results in

4 4
4 5
3 7

Which is incorrect. Since I am just transposing and modifying values, there should be the same numbers in each column. If I reset the freq_dist_weighted just before the loop, it works fine:

import nltk
from operator import itemgetter

class WebText:

    freq_dist_weighted = {} 

    def __init__(self, text):
        tokens = nltk.wordpunct_tokenize(text) #tokenize
        word_count = len(tokens)
        freq_dist = nltk.FreqDist(tokens)
        self.freq_dist_weighted = {}

        for word,frequency in freq_dist.iteritems():
            self.freq_dist_weighted[word] = frequency/word_count*frequency
        print len(freq_dist), len(self.freq_dist_weighted)

text1 = WebText("this is a test")
text2 = WebText("this is another test")
text3 = WebText("a final sentence")

results in (correct):

4 4
4 4
3 3

This doesn't make sense to me.

I don't see why I would have to reset it, since it's isolated within the objects. Am I doing something wrong?

回答1:

Your comment is blatantly wrong. Objects in a class scope are only initialized when the class is created; if you want a different object per instance then you need to move it into the initializer.

class WebText:
    def __init__(self, text):
        self.freq_dist_weighted = {} #### RESET the dictionary HERE ####
         ...


回答2:

Your freq_dist_weighted dictionary is a class attribute, not an instance attribute. Therefore it is shared among all instances of the class. (self.freq_dist_weighted still refers to the class attribute; since there's no instance-specific attribute of that name, Python falls back to looking on the class.)

To make it an instance attribute, set it in your class's __init__() method.

def __init__(self, text):
    self.freq_dist_weighted = {}
    ...


回答3:

class WebText:
    freq_dist_weighted = {}

declares freq_dist_weighted so that it is shared between all objects of type WebText; essentially, this is like a static member in C++.

If you want each WebText object to have its own freq_dist_weighted member (i.e. you can change it for one instance without changing it for another instance), you want to define it in __init__:

class WebText:
    def __init__(self):
        self.freq_dist_weighted = {}


回答4:

It works fine when a single instance of WebText is created. But when multiple WebText instances are created, then the new variable seems to be shared by all the objects.

Well, yes; of course it would work fine with a single instance when all one of them is sharing the value. ;)

The value is shared because Python follows a very simple rule: the things you define inside the class block belong to the class. I.e., they don't belong to instances. To attach something to an instance, you have to do it explicitly. This is normally done in __init__, but in normal cases (i.e. if you haven't used __slots__) can be done at any time. Assigning to an attribute of an object is just like assigning to an element of a list; there are no real protections because we're all mature adults here and are assumed to be responsible.

def __init__(self, text):
    self.freq_dist_weighted = {}
    # and proceed to modify it

Alternately:

def __init__(self, text):
    freq_dist_weighted = {}
    # prepare the dictionary contents first
    self.freq_dist_weighted = freq_dist_weighted