I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:
country value
AUT NA
AUT 5
AUT NA
AUT NA
GER NA
GER NA
GER 7
GER NA
GER NA
The following generates the above data frame:
data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))
Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf
in the zoo package. data <- na.locf(data)
would give me the following data frame:
country value
AUT NA
AUT 5
AUT 5
AUT 5
GER 5
GER 5
GER 7
GER 7
GER 7
However, the function should only be used on the individual subsets split by the country. The following is the output I would need:
country value
AUT NA
AUT 5
AUT 5
AUT 5
GER NA
GER NA
GER 7
GER 7
GER 7
I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.
Many thanks!!
Here's a
ddply
solution. Try thisEdit From
ddply
help you can find thatso another alternatives to get what you want are:
note that replacing
.variables
withDF$variable
is not allowed, that's why you got an error when doing this.DF
is your data.frameA combination of the packages dplyr and imputeTS can do the job.
With the na.remaining parameter of the na.locf function of imputeTS you have additionally the option to choose, what to do with the trailing NAs.
These are the options:
By choosing "mean" you would for example get a result with 7 for every GER in the specific example.
Split the
data.frame
withby
and usena.locf
on the subsets:If you would like to remove the row names:
If speed is a consideration then this
unstack
/stack
solution is about 4 to 6 times faster than the others on my system although it does entail a slightly longer line of code:Another approach is:
The tidyverse way, albeit not using locf, is:
You simply need to split by country, then a do either a zoo::na.locf() or na.fill, filling to the right. Here is an example explicitly showing the three-component arg syntax of na.fill: