How to fill missing geo location in datasets? [clo

2019-09-19 15:27发布

问题:

I have a set of dataset with missing geo location names and coordinates at same time. I want to fill in the gaps so that I can proceed with the future analysis of the data. The data set is harvested from twitter so it is not a created data but this is how the data has come and I need to fill in the gaps somehow and continue with future analysis.

Option 1: I can use either of the userLocation and userTimezone to find the coordinates

Input:

userLocation,   userTimezone,   Coordinates,
India,          Hawaii,    {u'type': u'Point', u'coordinates': [73.8567, 18.5203]}
California,     USA     
          ,     New Delhi,  
Ft. Sam Houston,Mountain Time (US & Canada),{u'type': u'Point', u'coordinates': [86.99643, 23.68088]}
Kathmandu,Nepal, Kathmandu, {u'type': u'Point', u'coordinates': [85.3248024, 27.69765658]}

Expected Output

userLocation,  userTimezone,   Coordinates_one, Coordinates_two
    India,          Hawaii,         73.8567,         18.5203
    California,     USA,            [fill this]      [fill this]
    [Fill this],    New Delhi,      [fill this]      [fill this]
    Ft. Sam Houston,Mountain Time (US & Canada), 86.99643, 23.68088
    Kathmandu,      Kathmandu,      85.3248024,      27.69765658

Is it possible to write a script in Python or pandas to fill in the missing location names and coordinates at same time with formatting the output properly?

I understand Python or Pandas does not have any magic package but something to start with would be helpful.

I have asked this question on GIS section but no much help over there. This is the first time ever I am working with Geo location data set and I have no clue how to start with. If the question is not suitable then please comment to delete it instead of down voting.

回答1:

As others have mentioned on the your GIS question, there is no magical way to produce something accurate, but I would play around with geopy. I assume you are able to loop over your missing data, example code and output demonstrating geopy:

from geopy.geocoders import Nominatim

geolocator = Nominatim() 

for location in ('California USA', 'New Delhi'):
    geoloc = geolocator.geocode(location)
    print location, ':', geoloc, geoloc.latitude, geoloc.longitude

Output:

California USA : California, United States of America 36.7014631 -118.7559974 
New Delhi : New Delhi, New Delhi District, Delhi, India 28.6138967 77.2159562

You may want to try different geocoded services (see the geopy doc), some of these service can take additional arguments, e.g. nomination can take the "country_bias" keyword which will bias results to the given country.