python ordered dict issue

2019-08-15 17:29发布

If I have a CSV file that has a dictionary value for each line (with columns being ["Location"], ["MovieDate"], ["Formatted_Address"], ["Lat"], ["Lng"]), I have been told to use OrderDict if I want to group by Location and append on all the MovieDate values that share the same Location value.

ex of data:

Location,MovieDate,Formatted_Address,Lat,Lng
    "Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
    "Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

For every row that has the same location (^as in this example), i'd like to make an output like this so that there are no duplicate locations.

 "Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

What's wrong with my code using ordereddict to do this?

from collections import OrderedDict

od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc,rest = row[0], row[1]
        od.setdefault(loc, []).append(rest)
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc]+vals)

What I end up with is something like this:

['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']

The issue is that I'm not getting the other columns to show up in this case, how would I best do that? I would also prefer to make the MovieDate values just one long string as here: 'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers ' instead of :

'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '

thanks guys, appreciate it. I'm a python noob.

Changing row[0], row[1] to row[0], row[1:] unfortunately doesn't give me what I want.. I only want to be adding the values in the second column (MovieDate), not replicating all the other columns as such:

['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]

3条回答
我欲成王,谁敢阻挡
2楼-- · 2019-08-15 17:59

You just need a couple of changes, you need to join the lat and long,to remove the dupe lat and longs we need to also use that as the key:

with open("data.csv") as f,open("new.csv" ,"w") as out:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    for row in r:
        od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc[0]] + vals+list(loc[1:]))

Output:

Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

A League of Their Own is first because it comes before the mad,mad line, row[1:-2] gets everything bar the lat,long and location, we store the lat and long in our key tuple to avoid duplicating writing it at the end of each row.

Using names and unpacking might make it a little easier to follow:

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc, mov, form, lat, long = row
        od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
    wr.writerow(header)
    for loc, vals in od.items():
        wr.writerow([loc[0]] + vals + list(loc[1:]))

Using csv.Dictwriter to keep five columns:

od = OrderedDict()
import csv

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
    wr = csv.DictWriter(out, fieldnames=r.fieldnames)
    for row in r:
        od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
                                        MovieDate=[], Formatted_Address=row["Formatted_Address"]))

        od[row["Location"]]["MovieDate"].append(row["MovieDate"])
    for loc, vals in od.items():
        od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
        wr.writerow(vals)

# Output:

"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

So the five columns remain intact, we joined the "MovieDate" into single strings and Formatted_Address=form is always unique so we don't need to update that.

It turns out to match what you wanted all we needed to do was concatenate the MovieDate's and remove duplicate entries for Location, Lat, Lng and 'Formatted_Address'.

查看更多
劫难
3楼-- · 2019-08-15 18:08

Assuming location is the first item of the row:

dict = {}
for line in f:
    if line[0] not in dict:
        dict[line[0]] = []
    dict[line[0]].append(line[1:])

And for every location, you have the entire rest of the row

for key, value in dict.iteritems():
    out.write(key + value)
查看更多
倾城 Initia
4楼-- · 2019-08-15 18:14

Let's try changing

od.setdefault(loc, []).append(rest) 

To

od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])

And then keep this as is:

wr.writerow([loc]+vals)
查看更多
登录 后发表回答