reverse geocoding to extract address components

2019-07-28 21:34发布

问题:

I'm trying to reverse geocode with R. I first used ggmap but couldn't get it to work with my API key. Now i'm trying it with googleway.

newframe[,c("Front.lat","Front.long")]

  Front.lat Front.long
1 -37.82681   144.9592
2 -37.82681   145.9592

newframe$address <- apply(newframe, 1, function(x){
  google_reverse_geocode(location = as.numeric(c(x["Front.lat"], 
x["Front.long"])),
                         key = "xxxx")
})

This extracts the variables as a list but I can't figure out the structure.

I'm struggling to figure out how to extract the address components listed below as variables in newframe

postal_code, administrative_area_level_1, administrative_area_level_2, locality, route, street_number

I would prefer each address component as a separate variable.

回答1:

After reverse geocoding into newframe$address the address components could be extracted further as follows:

# Make a boolean array of the valid ("OK" status) responses (other statuses may be "NO_RESULTS", "REQUEST_DENIED" etc).
sel <- sapply(c(1: nrow(newframe)), function(x){
  newframe$address[[x]]$status == 'OK'
})

# Get the address_components of the first result (i.e. best match) returned per geocoded coordinate.
address.components <- sapply(c(1: nrow(newframe[sel,])), function(x){
  newframe$address[[x]]$results[1,]$address_components
})

# Get all possible component types.
all.types <- unique(unlist(sapply(c(1: length(address.components)), function(x){
  unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
})))

# Get "long_name" values of the address_components for each type present (the other option is "short_name").
all.values <- lapply(c(1: length(address.components)), function(x){
  types <- unlist(lapply(address.components[[x]]$types, function(l) l[[1]]))
  matches <- match(all.types, types)
  values <- address.components[[x]]$long_name[matches]
})

# Bind results into a dataframe.
all.values <- do.call("rbind", all.values)
all.values <- as.data.frame(all.values)
names(all.values) <- all.types

# Add columns and update original data frame.
newframe[, all.types] <- NA
newframe[sel,][, all.types] <- all.values

Note that I've only kept the first type given per component, effectively skipping the "political" type as it appears in multiple components and is likely superfluous e.g. "administrative_area_level_1, political".



回答2:

Google's API returns the response in JSON. Which, when translated into R naturally forms nested lists. Internally in googleway this is done through jsonlite::fromJSON()

In googleway I've given you the choice of returning the raw JSON or a list, through using the simplify argument.

I've deliberately returned ALL the data from Google's response and left it up to the user to extract the elements they're interested in through usual list-subsetting operations.

Having said all that, in the development version of googleway I've written a few functions to help accessing elements of various API calls. Here are three of them that may be useful to you

## Install the development version
# devtools::install_github("SymbolixAU/googleway")

res <- google_reverse_geocode(
  location = c(df[1, 'Front.lat'], df[1, 'Front.long']), 
  key = apiKey
  )

geocode_address(res)
# [1] "45 Clarke St, Southbank VIC 3006, Australia"                    
# [2] "Bank Apartments, 275-283 City Rd, Southbank VIC 3006, Australia"
# [3] "Southbank VIC 3006, Australia"                                  
# [4] "Melbourne VIC, Australia"                                       
# [5] "South Wharf VIC 3006, Australia"                                
# [6] "Melbourne, VIC, Australia"                                      
# [7] "CBD & South Melbourne, VIC, Australia"                          
# [8] "Melbourne Metropolitan Area, VIC, Australia"                    
# [9] "Victoria, Australia"                                            
# [10] "Australia"

geocode_address_components(res)
#        long_name short_name                                  types
# 1             45         45                          street_number
# 2  Clarke Street  Clarke St                                  route
# 3      Southbank  Southbank                    locality, political
# 4 Melbourne City  Melbourne administrative_area_level_2, political
# 5       Victoria        VIC administrative_area_level_1, political
# 6      Australia         AU                     country, political
# 7           3006       3006                            postal_code

geocode_type(res)
# [[1]]
# [1] "street_address"
# 
# [[2]]
# [1] "establishment"      "general_contractor" "point_of_interest" 
# 
# [[3]]
# [1] "locality"  "political"
# 
# [[4]]
# [1] "colloquial_area" "locality"        "political"  


回答3:

You can use ggmap:revgeocode easily; look below:

library(ggmap)
df <- cbind(df,do.call(rbind,
        lapply(1:nrow(df),
          function(i) 
            revgeocode(as.numeric(
              df[i,2:1]), output = "more")      
                [c("administrative_area_level_1","locality","postal_code","address")])))

#output:
df
#   Front.lat Front.long administrative_area_level_1  locality
#   1 -37.82681   144.9592                    Victoria Southbank
#   2 -37.82681   145.9592                    Victoria    Noojee
#     postal_code                                     address
#   1        3006 45 Clarke St, Southbank VIC 3006, Australia
#   2        3833 Cec Dunns Track, Noojee VIC 3833, Australia

You can add "route" and "street_number" to the variables that you want to extract but as you can see the second address does not have street number and that will cause an error.

Note: You may also use sub and extract the information from the address.

Data:

df <- structure(list(Front.lat = c(-37.82681, -37.82681), Front.long = 
      c(144.9592, 145.9592)), .Names = c("Front.lat", "Front.long"), class = "data.frame", 
      row.names = c(NA, -2L))