I have a df:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
df
city_name state_name county_name
0 WASHINGTON DC DIST OF COLUMBIA
1 WASHINGTON DC DIST OF COLUMBIA
2 WASHINGTON DC DIST OF COLUMBIA
3 WASHINGTON DC DIST OF COLUMBIA
4 WASHINGTON DC DIST OF COLUMBIA
5 WASHINGTON DC DIST OF COLUMBIA
6 WASHINGTON DC DIST OF COLUMBIA
7 WASHINGTON DC DIST OF COLUMBIA
8 WASHINGTON DC DIST OF COLUMBIA
9 WASHINGTON DC DIST OF COLUMBIA
I want to get the latitude and longitude coordinates for any one of the columns in the data frame below. The documentation (http://geopy.readthedocs.org/en/latest/#data) is pretty straightforward when working with the documentation for individual locations.
>>> from geopy.geocoders import Nominatim
>>> geolocator = Nominatim()
>>> location = geolocator.geocode("175 5th Avenue NYC")
>>> print(location.address)
Flatiron Building, 175, 5th Avenue, Flatiron, New York, NYC, New York, ...
>>> print((location.latitude, location.longitude))
(40.7410861, -73.9896297241625)
>>> print(location.raw)
{'place_id': '9167009604', 'type': 'attraction', ...}
However I want to apply the function to each row in the df and make a new column. I've tried the following
df['city_coord'] = geolocator.geocode(lambda row: 'state_name' (row))
but I think I'm missing something in my code because I get the following:
city_name state_name county_name coordinates
0 WASHINGTON DC DIST OF COLUMBIA None
1 WASHINGTON DC DIST OF COLUMBIA None
2 WASHINGTON DC DIST OF COLUMBIA None
3 WASHINGTON DC DIST OF COLUMBIA None
4 WASHINGTON DC DIST OF COLUMBIA None
5 WASHINGTON DC DIST OF COLUMBIA None
6 WASHINGTON DC DIST OF COLUMBIA None
7 WASHINGTON DC DIST OF COLUMBIA None
8 WASHINGTON DC DIST OF COLUMBIA None
9 WASHINGTON DC DIST OF COLUMBIA None
I would like something like this hopefully using the Lambda function:
city_name state_name county_name city_coord
0 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
1 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
2 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
3 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
4 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
5 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
6 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
7 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
8 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
9 WASHINGTON DC DIST OF COLUMBIA 38.8949549, -77.0366456
10 GLYNCO GA GLYNN 31.2224512, -81.5101023
I appreciate any help. After I get the coordinates I'd like to map them. Any recommended resources for mapping coordinates is greatly appreciated too. thanks
Upvote and accept @EdChum's answer, I just wanted to add to this. His methods works perfect, but from personal experience I'd like to share a few things:
When dealing with geocoding, if you have multiple city/state combination that are repeating, it's much faster to send only 1 to get geocoded and then replicate the rest to other rows below:
This is very helpful for large data can be done through two ways:
drop_duplicate
group_by
the city/state combination, apply geocoding to it the first one by callinghead(1)
, then duplicate to the remainder rows.Reason is each time you call on Nominatim there's a small latency issue even if you were queuing the same city/state in a row. This small latency gets worse when your data gets large causing a huge delay in response and possible time out.
Again, this is all from personanly dealing with it. Just keep in mind for future use if it doesn't benefit you now.
You can call
apply
and pass the function you want to execute on every row like the following:You can then access the latitude and longitude attributes:
Or do it in a one liner by calling
apply
twice:Also your attempt
geolocator.geocode(lambda row: 'state_name' (row))
did nothing hence why you have a column full ofNone
valuesEDIT
@leb makes an interesting point here, if you have many duplicate values then it'll be more performant to geocode for each unique value and then add this:
So the above gets all the unique values using
unique
, constructs a dict from them and then callsmap
to perform the lookup and add the coords, this will be more efficient than trying to geocode row-wise