Background
I'm trying to visualize the results of a kmeans
clustering procedure on the following data using voronoi polygons
on a US map.
Here is the code I've been running so far:
input <- read.csv("LatLong.csv", header = T, sep = ",")
# K Means Clustering
set.seed(123)
km <- kmeans(input, 17)
cent <- data.frame(km$centers)
# Visualization
states <- map_data("state")
StateMap <- ggplot() + geom_polygon(data = states, aes(x = long, y = lat, group = group), col = "white")
# Voronoi
V <- deldir(cent$long, cent$lat)
ll <-apply(V$dirsgs, 1, FUN = function(x){
readWKT(sprintf("LINESTRING(%s %s, %s %s)", x[1], x[2], x[3], x[4]))
})
pp <- gPolygonize(ll)=
v_df <- fortify(pp)
# Plot
StateMap +
geom_point(data = input, aes(x = long, y = lat), col = factor(km$cluster)) +
geom_polygon(data = v_df, aes(x = long, y = lat, group = group, fill = id), alpha = .3) +
geom_label(data = cent, aes(x = long, y = lat, label = row.names(cent)), alpha = .3)
Producing the Following
Question
I'd like to be able to bind the outer area of the polygons and intersect the resulting area with my map of the United States so that the polygons entirely represent US land area. I haven't been able to figure out how to do this though. Any help is greatly appreciated.
My end goal in asking this question was to write a script where I can arbitrarily change the number of
kmeans
clusters and quickly visualize the results withvoronoi
polygons that cover my desired area region.I haven't quite accomplished this yet, but I have made enough progress that I figured posting what I have may lead to a quicker solution.
I also found the below.function while searching for a solution online.
With the above function defined polygons can be extracted accordingly
In order to get the
voronoi
polygons to fit nicely with a US map I downloaded cb_2014_us_state_20m from theCensus
website and ran the following:From here I could visualize my results using
ggplot
like before:Summary of Updates
The overlapping
voronoi
polygons still aren't a perfect fit (I'm guessing due to a lack of input data in the pacific northwest) although I'd imagine that should be a simple fix and I'll try to update that as soon as possible. Also if I alter the number ofkmeans centroids
in the beginning of my function and then re-run everything the polygons don't look very nice at all which is not what I was originally hoping for. I'll continue to update with improvements.