OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.
# Import libraries
import os
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
def locate(x):
geolocator = Nominatim()
# print(x) # debug
try:
#Get geocode
location = geolocator.geocode(x, timeout=8, exactly_one=True)
lat = location.latitude
lon = location.longitude
except:
#didn't work for some reason that I really don't care about
lat = np.nan
lon = np.nan
# print(lat,lon) #debug
return lat, lon # Note: also tried return { 'LAT': lat, 'LON': lon }
df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index() #works perfectly
df_geo_in['LAT'], df_geo_in['LON'] = df_geo_in.applymap(locate)
# error: returns more than 2 values - default index + column with results
I also tried
df_geo_in['LAT','LON'] = df_geo_in.applymap(locate)
I get a single dataframe with no index and a single colume with the series in it.
I've tried a number of other methods, including 'applymap' :
source_cols = ['LAT','LON']
new_cols = [str(x) for x in source_cols]
df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY'])
df_geo_in[new_cols] = df_geo_in.applymap(locate)
which returned an error after a long time:
ValueError: Columns must be same length as key
I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in)
method without success.
The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.
Thanks in advance - ancient C programmer
I'm assuming that
df_geo
is a df with a single column so I believe the following should work:change:
to
then you should be able to assign like so:
What you tried to do was assign the result of
applymap
to 2 new columns which is incorrect here asapplymap
is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.
It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:
this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.
Always easier to test with some sample data, but please try the following zip function to see if it works.