I am trying to layer multiple data frames in one line plot, with x = index
, y = values
. The 8 data.frames I work with come in this format (index and value) and are several hundred rows long:
Values
2306 0.000000
2307 1.004711
Because the data frames don't all have the same size, I am also trying to resize the data sets by converting them into percent (index/total number of values)*100, should I place this in the plotting code or should I better convert the data sets before plotting?
Hope the hivemind of StackOverflow can help an R newbie
If you want them all in a single plot, it would be easiest if you "stack" the data frames first and include a column that identifies which original data frame the data came from.
library(dplyr)
library(ggplot2)
First create fake data. The code below creates a list containing eight data frames. We'll assume this is where we start after we've read in the data. If you're reading in your data frames from separate files (csv files, for example), just read them all into a single list and then use bind_rows
to stack them:
# Fake data
set.seed(954)
df = lapply(paste0("d",0:7), function(x) {
n=sample(seq(100,500,100),1)
data.frame(source=x, index=1:n, values=cumsum(rnorm(n)))
})
# Stack the eight data frames into a single data frame
df = bind_rows(df)
Plot using ggplot. We use source
(the name of the original data frame) as the colour
aesthetic:
ggplot(df, aes(index, values, colour=source)) +
geom_line() +
theme_bw()
Or, if you want to normalize index
to span the same range for each data frame:
ggplot(df %>% group_by(source) %>%
mutate(index = index/max(index)),
aes(index, values, colour=source)) +
geom_line() +
theme_bw()
UPDATE: In response to your comment, if you have the data frame already, you could do this to get a single data frame:
df=lapply(paste0("sign",1:8), function(x) {
data.frame(source=x, get(x))
})
df=bind_rows(df)
But you must have read the data into R at some point and you can take care of this type of processing when you read the data files into R.
No hive mind required:
d0 <- data.frame(index = 1:100, values = rnorm(100))
d1 <- data.frame(index = 1:200, values = rnorm(200))
d2 <- data.frame(index = 1:100, values = rnorm(100))
d3 <- data.frame(index = 1:100, values = rnorm(100))
d4 <- data.frame(index = 1:100, values = rnorm(100))
d5 <- data.frame(index = 1:500, values = rnorm(500))
d6 <- data.frame(index = 1:100, values = rnorm(100))
d7 <- data.frame(index = 1:100, values = rnorm(100))
require(ggplot2)
p0 <- ggplot(d0, aes(x=index, y=values)) + geom_point(alpha=.3)
p1 <- ggplot(d1, aes(x=index, y=values)) + geom_point(alpha=.3)
p2 <- ggplot(d2, aes(x=index, y=values)) + geom_point(alpha=.3)
p3 <- ggplot(d3, aes(x=index, y=values)) + geom_point(alpha=.3)
p4 <- ggplot(d4, aes(x=index, y=values)) + geom_point(alpha=.3)
p5 <- ggplot(d5, aes(x=index, y=values)) + geom_point(alpha=.3)
p6 <- ggplot(d6, aes(x=index, y=values)) + geom_point(alpha=.3)
p7 <- ggplot(d7, aes(x=index, y=values)) + geom_point(alpha=.3)
require(Rmisc)
multiplot(p0, p1, p2, p3, p4, p5, p6, p7, cols=2)