I am trying to replicate the code from Andrew Ng's Machine Learning course on Coursera in R (as the course is in Octave).
Basically I have to plot a non linear decision boundary (at p = 0.5) for a polynomial regularized logistic regression.
I can easily replicate the plot with the base library:
contour(u, v, z, levels = 0)
points(x = data$Test1, y = data$Test2)
where:
u <- v <- seq(-1, 1.5, length.out = 100)
and z is a matrix 100x100 with the values of z for every point of the grid.
Dimension of data is 118x3.
I cannot do it in ggplot2. Does somebody know how to replicate the same in ggplot2? I tried with:
z = as.vector(t(z))
ggplot(data, aes(x = Test1, y = Test2) + geom_contour(aes(x = u, y =
v, z = z))
But I get the error: Aesthetics must be either length 1 or the same as the data (118): colour, x, y, shape
Thanks.
EDIT (Adding plot created from code of missuse):
What you need is to convert the coordinates into long format. Here is an example using volcano data set:
data(volcano)
in base R:
contour(volcano)
with ggplot2:
library(tidyverse)
as.data.frame(volcano) %>% #convert the matrix to data frame
rownames_to_column() %>% #get row coordinates
gather(key, value, -rowname) %>% #convert to long format
mutate(key = as.numeric(gsub("V", "", key)), #convert the column names to numbers
rowname = as.numeric(rowname)) %>%
ggplot() +
geom_contour(aes(x = rowname, y = key, z = value))
if you would like to label it directly as in base R plot you can use library directlabels
:
First map the color/fill to a variable:
as.data.frame(volcano) %>%
rownames_to_column() %>%
gather(key, value, -rowname) %>%
mutate(key = as.numeric(gsub("V", "", key)),
rowname = as.numeric(rowname)) %>%
ggplot() +
geom_contour(aes(x = rowname,
y = key,
z = value,
colour = ..level..)) -> some_plot
and then
library(directlabels)
direct.label(some_plot, list("far.from.others.borders", "calc.boxes", "enlarge.box",
box.color = NA, fill = "transparent", "draw.rects"))
to add markers at specific coordinates you just need to add another layer with appropriate data:
the previous plot
as.data.frame(volcano) %>%
rownames_to_column() %>%
gather(key, value, -rowname) %>%
mutate(key = as.numeric(gsub("V", "", key)),
rowname = as.numeric(rowname)) %>%
ggplot() +
geom_contour(aes(x = rowname, y = key, z = value)) -> plot_cont
add layer with points for instance:
plot_cont +
geom_point(data = data.frame(x = c(35, 47, 61),
y = c(22, 37, 15)),
aes(x = x, y = y), color = "red")
you can add any type of layer this way: geom_line
, geom_text
to name a few.
EDIT2: to change the scale of the axis there are several options, one is to assign appropriate rownames
and colnames
to the matrix:
I will assign a sequence from 0 - 2 for the x axis and 0 - 5 to the y axis:
rownames(volcano) <- seq(from = 0,
to = 2,
length.out = nrow(volcano)) #or some vector like u
colnames(volcano) <- seq(from = 0,
to = 5,
length.out = ncol(volcano)) #or soem vector like v
as.data.frame(volcano) %>%
rownames_to_column() %>%
gather(key, value, -rowname) %>%
mutate(key = as.numeric(key),
rowname = as.numeric(rowname)) %>%
ggplot() +
geom_contour(aes(x = rowname, y = key, z = value))
ggplot2 works most efficiently with data in long format. Here's an example with fake data:
library(tidyverse)
u <- v <- seq(-1, 1.5, length.out = 100)
# Generate fake data
z = outer(u, v, function(a, b) sin(2*a^3)*cos(5*b^2))
rownames(z) = u
colnames(z) = v
# Convert data to long format and plot
as.data.frame(z) %>%
rownames_to_column(var="row") %>%
gather(col, value, -row) %>%
mutate(row=as.numeric(row),
col=as.numeric(col)) %>%
ggplot(aes(col, row, z=value)) +
geom_contour(bins=20) +
theme_classic()