If I have a CSV file that has a dictionary value for each line (with columns being ["Location"], ["MovieDate"], ["Formatted_Address"], ["Lat"], ["Lng"]), I have been told to use OrderDict if I want to group by Location
and append on all the MovieDate
values that share the same Location
value.
ex of data:
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
"Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
For every row that has the same location (^as in this example), i'd like to make an output like this so that there are no duplicate locations.
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
What's wrong with my code using ordereddict to do this?
from collections import OrderedDict
od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
r = csv.reader(f)
wr = csv.writer(out)
header = next(r)
for row in r:
loc,rest = row[0], row[1]
od.setdefault(loc, []).append(rest)
wr.writerow(header)
for loc,vals in od.items():
wr.writerow([loc]+vals)
What I end up with is something like this:
['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']
The issue is that I'm not getting the other columns to show up in this case, how would I best do that? I would also prefer to make the MovieDate values just one long string as here:
'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers '
instead of :
'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '
thanks guys, appreciate it. I'm a python noob.
Changing row[0], row[1]
to row[0], row[1:]
unfortunately doesn't give me what I want.. I only want to be adding the values in the second column (MovieDate), not replicating all the other columns as such:
['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]
You just need a couple of changes, you need to join the lat and long,to remove the dupe lat and longs we need to also use that as the key:
Output:
A League of Their Own
is first because it comes before the mad,mad line,row[1:-2]
gets everything bar the lat,long and location, we store the lat and long in our key tuple to avoid duplicating writing it at the end of each row.Using names and unpacking might make it a little easier to follow:
Using csv.Dictwriter to keep five columns:
# Output:
So the five columns remain intact, we joined the
"MovieDate"
into single strings andFormatted_Address=form
is always unique so we don't need to update that.It turns out to match what you wanted all we needed to do was concatenate the
MovieDate's
and remove duplicate entries for Location, Lat, Lng and'Formatted_Address'
.Assuming location is the first item of the row:
And for every location, you have the entire rest of the row
Let's try changing
To
And then keep this as is: