Dict and List Manipulation Python

2019-09-19 13:42发布

I have two files one has key and other has both key and value. I have to match the key of file one and pull the corresponding value from file two. When all the key and value are in plain column format i can get the key and value to a new file very well. But I am not understanding how to get a result when the value is in set/array type.

Input one in column format:

5216 3911 2 761.00 
2503 1417 13 102866.00
5570 50 2 3718.00 
5391 1534 3 11958.00 
5015 4078 1 817.00 
3430 299 1 5119.00 
4504 3369 2 3218.00  
4069 4020 2 17854.00 
5164 4163 1 107.00 
3589 3026 1 7363.00 

Input two in column format. They are key as pair i.e. col[0] and col[1] both are key as pairs

5391 1534 
5015 4078 
3430 299 
4504 3369  

Output for the above input case, which is right for me

5391 1534 3 11958.00 
5015 4078 1 817.00 
3430 299 1 5119.00 
4504 3369 2 3218.00 

Program

from collections import defaultdict

edges = {}
with open('Input_1.txt', 'r') as edge_data:    
    for row in edge_data:
        col = row.strip().split()
        edges[col[0], col[1]] = col[2], col[3]
#Then to do the queries, read through the first file and print out the matches:
with open('Input_2', 'r') as classified_data:
    with open ('Output', 'w') as outfile:    
    for row in classified_data:
            a,b = row.strip().split()
        c = edges.get((a,b), edges.get((b,a)))

        #print a,b, edges.get((a,b), edges.get((b,a)))
        #print a,b,c        
        outfile.write("%s %s %s\n" % (a,b,c))

The above program works great for the above given input types. But I have no clue how to get the operations for the below given inputs.

I understand I am supposed to change this statement from the above program but I am not getting any clue what should that be changed to ?

edges[col[0], col[1]] = col[2], col[3]

New Input one

('3350', '2542') [6089.0, 4315.0] 
('2655', '1411') [559.0, 1220.0, 166.0, 256.0, 146.0, 528.0, 1902.0, 880.0, 2317.0, 2868.0] 
('4212', '1613') [150.0, 14184.0, 4249.0, 1250.0, 10138.0, 4281.0, 2846.0, 2205.0, 1651.0, 335.0, 5233.0, 149.0, 6816.0] 
('4750', '2247') [3089.0] 
('5305', '3341') [13122.0]

New Input two They are key as pair i.e. col[0] and col[1] both are key as pairs

3350 2542
4750 2247
5305 3341

Expected output is

3350 2542 6089.0
3350 2542 4315.0
4750 2247 3089.0
5305 3341 13122.0

3条回答
Emotional °昔
2楼-- · 2019-09-19 14:28

I thought @three_pineapples's eval manner is quite good and brilliant,

Here is an alternative one which only manipulate string:

edges = {}
with open("Input_1.txt", "r") as edge_data:
    for row in edge_data:
        k, v = row.strip().split(")") # split as key, value
        k = " ".join(i.strip("'") for i in k.strip("(").split(", ")) # clean unwanted symbol and merge together
        v = v.strip(" []").split(", ") # get list value
        edges[k] = v

with open("Input_2", "r") as classified_data:
    for row in classified_data:
        k = row.strip();
        for v in edges.get(k, []):
            print k, v
查看更多
叛逆
3楼-- · 2019-09-19 14:36

Use pattern matching

import re
rec = re.compile(r"\('(\d+)',\s*'(\d+)'\)\s*\[(.*)\]")
matches = rec.match("('3350', '2542') [6089.0, 4315.0]")
print matches.groups()
print int(matches.group(1))
print int(matches.group(2))
print map(float, matches.group(3).split(','))

The output is

('3350', '2542', '6089.0, 4315.0')
3350
2542
[6089.0, 4315.0]

To save the data

a = int(matches.group(1))
b = int(matches.group(2))
data = map(float, matches.group(3).split(','))
edges[a,b] = data

To get data and print the output

c = edges.get((a,b), edges.get((b,a)))
for value in c:
   print "%s %s %s\n" % (a,b, value)
查看更多
成全新的幸福
4楼-- · 2019-09-19 14:36

I would suggest splitting the string on a different caharacter, say ')'

So you would do something like:

with open('Input_1.txt', 'r') as edge_data:    
    for row in edge_data:
        col = row.strip().split(')')

You then want to convert the string representation of a tuple and a list, into something you can work with. You can do this by using eval()

        key = eval(col[0]+')') # note I add the bracket back in that we split on
        value = eval(col[1])
        edges[key] = value

You now have a dictionary edges with keys that match the tuple in file one and values that contain the associated lists

When you read in file 2, you will need to add another loop that iterates over the entries in the list. For example, something like

for c in edges[(a,b)]:
    outfile.write("%s %s %s\n" % (a,b,c))

This will allow you to write a line to your output file for each entry in the list you read in from the first file.

查看更多
登录 后发表回答