可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a long text file having truck configurations. In each line some properties of a truck is listed as a string. Each property has its own fixed width space in the string, such as:

2 chracters = number of axles
2 characters = weight of the first axle
2 characters = weight of the second axle
...
2 characters = weight of the last axle
2 characters = length of the first axle spacing (spacing means distance between axles)
2 characters = length of the second axle spacing
...
2 characters = length of the last axle spacing

As an example:

031028331004

refers to:

number of axles = 3
first axle weight = 10
second axle weight = 28
third axle weight = 33
first spacing = 10
second spacing = 4

Now, you have an idea about my file structure, here is my problem: I would like to group these trucks in separate lists, and name the lists in terms of axle spacings. Let's say I am using a boolean type of approach, and if the spacing is less than 6, the boolean is 1, if it is greater than 6, the boolean is 0. To clarify, possible outcomes in a three axle truck becomes:

00 #Both spacings > 6
10 #First spacing < 6, second > 6
01 #First spacing > 6, second < 6
11 #Both spacings < 6

Now, as you see there are not too many outcomes for a 3 axle truck. However, if I have a 12 axle truck, the number of "possible" combinations go haywire. The thing is, in reality you would not see all "possible" combinations of axle spacings in a 12 axle truck. There are certain combinations (I don't know which ones, but to figure it out is my aim) with a number much less than the "possible" number of combinations.

I would like the code to create lists and fill them with the strings that define the properties I mentioned above if only such a combination exists. I thought maybe I should create lists with variable names such as:

truck_0300[]
truck_0301[]
truck_0310[]
truck_0311[]

on the fly. However, from what I read in SF and other sources, this is strongly discouraged. How would you do it using the dictionary concept? I understand that dictionaries are like 2 dimensional arrays, with a key (in my case the keys would be something like truck_0300, truck_0301 etc.) and value pair (again in my case, the values would probably be lists that hold the actual strings that belong to the corresponding truck type), however I could not figure out how to create that dictionary, and populate it with variable keys and values.

Any insight would be welcome! Thanks a bunch!

回答1:

You are definitely correct that it is almost always a bad idea to try and create "dynamic variables" in a scope. Dictionaries usually are the answer to build up a collection of objects over time and reference back to them...

I don't fully understand your application and format, but in general to define and use your dictionary it would look like this:

trucks = {}
trucks['0300'] = ['a']
trucks['0300'].append('c')
trucks['0300'].extend(['c','d'])

aTruck = trucks['0300']

Now since every one of these should be a list of your strings, you might just want to use a defaultdict, and tell it to use a list as default value for non existant keys:

from collections import defaultdict

trucks = defaultdict(list)
trucks['0300']
# []

Note that even though it was a brand new dict that contained no entries, the 'truck_0300' key still return a new list. This means you don't have to check for the key. Just append:

trucks = defaultdict(list)
trucks['0300'].append('a')

A defaultdict is probably what you want, since you do not have to pre-define keys at all. It is there when you are ready for it.

Getting key for the max value

From your comments, here is an example of how to get the key with the max value of a dictionary. It is pretty easy, as you just use max and define how it should determine the key to use for the comparisons:

d = {'a':10, 'b':5, 'c':50}
print max(d.iteritems(), key=lambda (k,v): v)
# ('c', 50)
d['c'] = 1
print max(d.iteritems(), key=lambda (k,v): v)
# ('a', 10)

All you have to do is define how to produce a comparison key. In this case I just tell it to take the value as the key. For really simply key functions like this where you are just telling it to pull an index or attribute from the object, you can make it more efficient by using the operator module so that the key function is in C and not in python as a lambda:

from operator import itemgetter
...
print max(d.iteritems(), key=itemgetter(1))
#('c', 50)

itemgetter creates a new callable that will pull the second item from the tuple that is passed in by the loop.

Now assume each value is actually a list (similar to your structure). We will make it a list of numbers, and you want to find the key which has the list with the largest total:

d = {'a': range(1,5), 'b': range(2,4), 'c': range(5,7)}
print max(d.iteritems(), key=lambda (k,v): sum(v))
# ('c', [5, 6])

回答2:

If the number of keys is more than 10,000, then this method is not viable. Otherwise define a dictionary d = {} and do a loop over your lines:

key = line[:4]
if not key in d.keys():
    d[key] = []
d[key] += [somevalue]

I hope this helps.

回答3:

Here's a complete solution from string to output:

from collections import namedtuple, defaultdict

# lightweight class
Truck = namedtuple('Truck', 'weights spacings')

def parse_truck(s):
    # convert to array of numbers
    numbers = [int(''.join(t)) for t in zip(s[::2], s[1::2])]

    # check length
    n = numbers[0]
    assert n * 2 == len(numbers)
    numbers = numbers[1:]

    return Truck(numbers[:n], numbers[n:])

trucks = [
    parse_truck("031028331004"),
    ...
]

# dictionary where every key contains a list by default
trucks_by_spacing = defaultdict(list)

for truck in trucks:
    # (True, False) instead of '10'
    key = tuple(space > 6 for space in truck.spacings)
    trucks_by_spacing[key].append(truck)

print trucks_by_spacing

print trucks_by_spacing[True, False]