Iterate in 2 different dictionaries simultaneously

2019-07-27 11:57发布

问题:

---EDIT 2--- So I get the question Why I use dictionaries?, this question is a follow up on this one: csv file compression without using existing libraries in Python

I Need to compress a 500k csv file (19MB), and I chose to use dictionary to store the ticks in one csv file and symbs in another to be able to Decompress the values

QUESTION: How do I iterate the most optimized way? this is just an example of 4 rows, but my real file has 500 000 lines, and takes me for ever to iterate through the list.

I have 3 dictionaries:

originalDict = {
               0: ['6NH8', 'F', 'A', '0', '60541567', '60541567', '78.78', '20'], 
               1: ['6NH8', 'F', 'A', '0', '60541569', '60541569', '78.78', '25'], 
               2: ['6AH8', 'F', 'B', '0', '60541765', '60541765', '90.52', '1'], 
               3: ['QMH8', 'F', 'B', '0', '60437395', '60437395', '950.5', '1']
               }
ticks = {0: '6NH8', 1: '6AH8', 2: 'QMH8'}
symbs = {0: 'F,A', 1: 'F,B'}

I want to iterate through originalDict and change the "ticks" and then the symbs at index 1 and index 2 and then remove index 2

so, i.e.

0: ['6NH8', 'F', 'A', '0', '60541567', '60541567', '78.78', '20']

becomes:

[0, '0', '0', '60541567', '60541567', '78.78', '20']

I have currently a for loop going through values in originalDict, and inside that another for loop:

for values in originalDict.values():
    for ticksKey, ticksValue in ticks.items():
        if values[0] == ticksValue:
            values[0] = ticksKey

    #Change symbs and remove char combination
    for symbsKey, symbsValue in symbs.items():
        comprComb = values[1] + "," + values[2]

        if comprComb == symbsValue:
            values[1] = str(symbsKey)
            #del values[4]
            #del values[4]
            del values[2]

ADDITIONAL INFO ADDED: The reason I have them as dictionary is because the 500 000 lines, some of the ticks occurs more than once, so, I give them a int which is the key in the dict, so goes for the symbs dictionary too.

回答1:

So first of all you want to reverse the mapping, you are currently looking by value, which is wrong and slow:

ticks = {0: '6NH8', 1: '6AH8', 2: 'QMH8'}
symbs = {0: 'F,A', 1: 'F,B'}

Using ticks = {v: k for k, v in ticks.items()} (same for symbs):

{'6NH8': 0, 'QMH8': 2, '6AH8': 1} # ticks

{'F,A': 0, 'F,B': 1} # symbs

Now that you have good data structures you can do this rather fast.

Now transform the dictionary that holds the data to a list (not sure why it is a dictionary to start with):

originalList = [originalDict[k] for k in range(len(originalDict))]

And re-map values:

for line in originalList:
    line[0] = ticks[line[0]]
    line[1:3] = [symbs["%s,%s" % tuple(line[1:3])]]

result:

[[0, 0, '0', '60541567', '60541567', '78.78', '20'], [0, 0, '0', '60541569', '60541569', '78.78', '25'], [1, 1, '0', '60541765', '60541765', '90.52', '1'], [2, 1, '0', '60437395', '60437395', '950.5', '1']]


回答2:

You can speed up the lookup by inverting the keys and values in the ticks and symbs dicts and then just looking up the right values instead of iterating and comparing all the values in the dicts:

ticks_inv = {v: k for k, v in ticks.items()}
symbs_inv = {v: k for k, v in symbs.items()}

for values in originalDict.values():
    if values[0] in ticks_inv:
        values[0] = ticks_inv[values[0]]

    comprComb = "{v[1]},{v[2]}".format(v=values)
    if comprComb in symbs_inv:
        values[1] = symbs_inv[comprComb]
        del values[2]

Result is the same as with your code, but should be much faster, particularly if ticks and symbs are large. Of course, this assumes that the values are unique, but if otherwise your code would not work correctly ether.



回答3:

Your dictionary is backwards; it's not using the dictionary's key-lookup feature. Instead of

for ticksKey, ticksValue in ticks.items():
    if values[0] == ticksValue:
        values[0] = ticksKey

try

ticks = {'6NH8': 0, '6AH8': 1, 'QMH8': 2}
...
if values[0] in ticks:
    values[0] = ticks[values[0]]

A little weirder looking would be just

values[0] = ticks[values[0]] or values[0]

If you do that, and similarly with symbs, you'll remove all but the necessary outmost loop and see a significant performance improvement.