I have a set of dataset with missing geo location names and coordinates at same time. I want to fill in the gaps so that I can proceed with the future analysis of the data. The data set is harvested from twitter so it is not a created data but this is how the data has come and I need to fill in the gaps somehow and continue with future analysis.
Option 1: I can use either of the userLocation
and userTimezone
to find the coordinates
Input:
userLocation, userTimezone, Coordinates,
India, Hawaii, {u'type': u'Point', u'coordinates': [73.8567, 18.5203]}
California, USA
, New Delhi,
Ft. Sam Houston,Mountain Time (US & Canada),{u'type': u'Point', u'coordinates': [86.99643, 23.68088]}
Kathmandu,Nepal, Kathmandu, {u'type': u'Point', u'coordinates': [85.3248024, 27.69765658]}
Expected Output
userLocation, userTimezone, Coordinates_one, Coordinates_two
India, Hawaii, 73.8567, 18.5203
California, USA, [fill this] [fill this]
[Fill this], New Delhi, [fill this] [fill this]
Ft. Sam Houston,Mountain Time (US & Canada), 86.99643, 23.68088
Kathmandu, Kathmandu, 85.3248024, 27.69765658
Is it possible to write a script in Python or pandas to fill in the missing location names and coordinates at same time with formatting the output properly?
I understand Python or Pandas does not have any magic package but something to start with would be helpful.
I have asked this question on GIS section but no much help over there. This is the first time ever I am working with Geo location data set and I have no clue how to start with. If the question is not suitable then please comment to delete it instead of down voting.
As others have mentioned on the your GIS question, there is no magical way to produce something accurate, but I would play around with geopy. I assume you are able to loop over your missing data, example code and output demonstrating geopy:
Output:
You may want to try different geocoded services (see the geopy doc), some of these service can take additional arguments, e.g. nomination can take the "country_bias" keyword which will bias results to the given country.