Multiple keys per value

2019-01-09 01:00发布

问题:

Is it possible to assign multiple keys per value in a Python dictionary. One possible solution is to assign value to each key:

dict = {'k1':'v1', 'k2':'v1', 'k3':'v1', 'k4':'v2'}

but this is not memory efficient since my data file is > 2 GB. Otherwise you could make a dictionary of dictionary keys:

key_dic = {'k1':'k1', 'k2':'k1', 'k3':'k1', 'k4':'k4'}
dict = {'k1':'v1', 'k4':'v2'}
main_key = key_dict['k2']
value = dict[main_key]

This is also very time and effort consuming because I have to go through whole dictionary/file twice. Is there any other easy and inbuilt Python solution?

Note: my dictionary values are not simple string (as in the question 'v1', 'v2') rather complex objects (contains different other dictionary/list etc. and not possible to pickle them)

Note: the question seems similar as How can I use both a key and an index for the same dictionary value? But I am not looking for ordered/indexed dictionary and I am looking for other efficient solutions (if any) other then the two mentioned in this question.

回答1:

What type are the values?

dict = {'k1':MyClass(1), 'k2':MyClass(1)}

will give duplicate value objects, but

v1 = MyClass(1)
dict = {'k1':v1, 'k2':v1}

results in both keys referring to the same actual object.

In the original question, your values are strings: even though you're declaring the same string twice, I think they'll be interned to the same object in that case


NB. if you're not sure whether you've ended up with duplicates, you can find out like so:

if dict['k1'] is dict['k2']:
    print("good: k1 and k2 refer to the same instance")
else:
    print("bad: k1 and k2 refer to different instances")

(is check thanks to J.F.Sebastian, replacing id())



回答2:

Check out this - it's an implementation of exactly what you're asking: multi_key_dict(ionary)

https://pypi.python.org/pypi/multi_key_dict (sources at https://github.com/formiaczek/python_data_structures/tree/master/multi_key_dict)

(on Unix platforms it possibly comes as a package and you can try to install it with something like:

sudo apt-get install python-multi-key-dict

for Debian, or an equivalent for your distribution)

You can use different types for keys but also keys of the same type. Also you can iterate over items using key types of your choice, e.g.:

m = multi_key_dict()
m['aa', 12] = 12
m['bb', 1] = 'cc and 1'
m['cc', 13] = 'something else'

print m['aa']   # will print '12'
print m[12]     # will also print '12'

# but also:
for key, value in m.iteritems(int):
    print key, ':', value
# will print:1
# 1 : cc and 1
# 12 : 12
# 13 : something else

# and iterating by string keys:
for key, value in m.iteritems(str):
    print key, ':', value
# will print:
# aa : 12
# cc : something else
# bb : cc and 1

m[12] = 20 # now update the value
print m[12]   # will print '20' (updated value)
print m['aa']   # will also print '20' (it maps to the same element)

There is no limit to number of keys, so code like:

m['a', 3, 5, 'bb', 33] = 'something' 

is valid, and either of keys can be used to refer to so-created value (either to read / write or delete it).

Edit: From version 2.0 it should also work with python3.



回答3:

I'm surprised no one has mentioned using Tuples with dictionaries. This works just fine:

my_dictionary = {}
my_dictionary[('k1', 'k2', 'k3')] = 'v1'
my_dictionary[('k4')] = 'v2'


回答4:

Using python 2.7/3 you can combine a tuple, value pair with dictionary comprehension.

keys_values = ( (('k1','k2'), 0), (('k3','k4','k5'), 1) )

d = { key : value for keys, value in keys_values for key in keys }

You can also update the dictionary similarly.

keys_values = ( (('k1',), int), (('k3','k4','k6'), int) )

d.update({ key : value for keys, value in keys_values for key in keys })

I don't think this really gets to the heart of your question but in light of the title, I think this belongs here.



回答5:

The most straightforward way to do this is to construct your dictionary using the dict.fromkeys() method. It takes a sequence of keys and a value as inputs and then assigns the value to each key.
Your code would be:

dict = dict.fromkeys(['k1', 'k2', 'k3'], 'v1')
dict.update(dict.fromkeys(['k4'], 'v2'))

And the output is:

print(dict)
{'k1': 'v1', 'k2': 'v1', 'k3': 'v1', 'k4': 'v2'}


回答6:

You can build an auxiliary dictionary of objects that were already created from the parsed data. The key would be the parsed data, the value would be your constructed object -- say the string value should be converted to some specific object. This way you can control when to construct the new object:

existing = {}   # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
    obj = existing.setdefault(v, MyClass(v))  # could be made more efficient
    result[k] = obj

Then all the result dictionary duplicate value objects will be represented by a single object of the MyClass class. After building the result, the existing auxiliary dictionary can be deleted.

Here the dict.setdefault() may be elegant and brief. But you should test later whether the more talkative solution is not more efficient -- see below. The reason is that MyClass(v) is always created (in the above example) and then thrown away if its duplicate exists:

existing = {}   # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
    if v in existing:
        obj = existing[v]
    else:
        obj = MyClass(v)
        existing[v] = obj

    result[k] = obj

This technique can be used also when v is not converted to anything special. For example, if v is a string, both key and value in the auxiliary dictionary will be of the same value. However, the existence of the dictionary ensures that the object will be shared (which is not always ensured by Python).



回答7:

I was able to achieve similar functionality using pandas MultiIndex, although in my case the values are scalars:

>>> import numpy
>>> import pandas
>>> keys = [numpy.array(['a', 'b', 'c']), numpy.array([1, 2, 3])]
>>> df = pandas.DataFrame(['val1', 'val2', 'val3'], index=keys)
>>> df.index.names = ['str', 'int']
>>> df.xs('b', axis=0, level='str')
        0
int      
2    val2

>>> df.xs(3, axis=0, level='int')
        0
str      
c    val3