I would like to perform a linear interpolation in a variable of a data frame which takes into account the: 1) time difference between the two points, 2) the moment when the data was taken and 3) the individual taken for measure the variable.
For example in the next dataframe:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, 5, NA, 7))
df
I would like to obtain:
result <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, 4, 5, 6, 7, 5, 5.5, 6))
result
I cannot use exclusively the function na.approx
of the package zoo
because all observations are not consecutives, some observations belong to one individual and other observations belong to other ones. The reason is because if the second individual would have its first obsrevation with NA
and I would use exclusively the function na.approx
, I would be using information from the individual==1
to interpolate the NA
of the individual==2
(e.g the next data frame would have sucherror)
df_2 <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(1, 2, 3, NA, 5, NA, 7, NA, 5, 7))
df_2
I have tried using the packages zoo
and dplyr
:
library(dplyr)
library(zoo)
proof <- df %>%
group_by(Individuals) %>%
na.approx(df$Value)
But I cannot perform group_by
in a zoo
object.
Do you know how to interpolate NA
values in one variable by groups?
Thanks in advance,
Use data.frame
, rather than cbind
to create your data. cbind
returns a matrix, but you need a data frame for dplyr
. Then use na.approx
inside mutate
. I've commented out group_by
, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))
library(dplyr)
library(zoo)
df %>%
group_by(Individuals) %>%
mutate(ValueInterp = na.approx(Value, na.rm=FALSE))
time Individuals Value ValueInterp
1 1 1 NA NA
2 2 1 2 2
3 3 1 3 3
4 4 1 NA 4
5 5 1 5 5
6 6 1 NA 6
7 7 1 7 7
8 1 2 8 8
9 2 2 NA 9
10 3 2 10 10
Update: To interpolate multiple columns, we can use mutate_at
. Here's an example with two value columns. We use mutate_at
to run na.approx
on all columns that include "Value"
in the column name. list(interp=na.approx)
tells mutate_at
to generate new column names by running na.approx
and adding interp
as a suffix to generate the new column names:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)
time Individuals Value1 Value2 Value1_interp Value2_interp
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA NA NA
2 2 1 2 4 2 4
3 3 1 3 6 3 6
4 4 1 NA NA 4 8
5 5 1 5 10 5 10
6 6 1 NA NA 6 12
7 7 1 7 14 7 14
8 1 2 8 16 8 16
9 2 2 NA NA 9 18
10 3 2 10 20 10 20
If you don't want to preserve the original, uninterpolated columns, you can do:
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)
We can use data.table
library(data.table)
library(zoo)
setDT(df1)[, ValueInterp:= na.approx(Value, na.rm=TRUE), by = Individual]