How to replace NaN value with previous non-NaN wit

2019-08-27 15:35发布

问题:

I need to replace the NaN values with the previous non-NaN value within the group.

Here it is an example:

+-------+------------+-------+
| ts_id |    date    | value |
+-------+------------+-------+
|     2 | 01/10/2014 | 18    |
|     2 | 01/11/2014 | 15    |
|     2 | 01/12/2014 | NaN   |
|     2 | 01/01/2015 | NaN   |
|     2 | 01/02/2015 | NaN   |
|     3 | 01/03/2015 | 19    |
|     3 | 01/04/2015 | 20    |
|     3 | 01/10/2015 | 12    |
|     3 | 01/11/2015 | 17    |
|     3 | 01/12/2015 | NaN   |
|     3 | 01/01/2016 | NaN   |
|     3 | 01/08/2016 | 7     |
|     3 | 01/09/2016 | NaN   |
|     3 | 01/10/2016 | NaN   |
|     3 | 01/11/2016 | NaN   |
|     3 | 01/12/2016 | NaN   |
|     3 | 01/01/2017 | NaN   |
+-------+------------+-------+

Data:

data <- structure(list(ts_id = c(2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 
                         3, 3, 3, 3, 3), date = structure(c(16344, 16375, 16405, 16436, 
                                                            16467, 16495, 16526, 16709, 16740, 16770, 16801, 17014, 17045, 
                                                            17075, 17106, 17136, 17167), class = "Date"), value = c(18, 15, 
                                                                                                                    NaN, NaN, NaN, 19, 20, 12, 17, NaN, NaN, 7, NaN, NaN, NaN, NaN, 
                                                                                                                    NaN)), row.names = c(NA, -17L), vars = "ts_id", drop = TRUE, indices = list(
                                                                                                                      0:16), group_sizes = 17L, biggest_group_size = 17L, labels = structure(list(
                                                                                                                        ts_id = 3L), row.names = c(NA, -1L), class = "data.frame", vars = "ts_id", drop = TRUE), class = "data.frame")

Within each group (identified by ts_id), I can have NaN values at any given date. I need to replace each NaN with the most recent non-NaN value.

The results should look like:

+-------+------------+-------+
| ts_id |    date    | value |
+-------+------------+-------+
|     2 | 01/10/2014 |    18 |
|     2 | 01/11/2014 |    15 |
|     2 | 01/12/2014 |    15 |
|     2 | 01/01/2015 |    15 |
|     2 | 01/02/2015 |    15 |
|     3 | 01/03/2015 |    19 |
|     3 | 01/04/2015 |    20 |
|     3 | 01/10/2015 |    12 |
|     3 | 01/11/2015 |    17 |
|     3 | 01/12/2015 |    17 |
|     3 | 01/01/2016 |    17 |
|     3 | 01/08/2016 |     7 |
|     3 | 01/09/2016 |     7 |
|     3 | 01/10/2016 |     7 |
|     3 | 01/11/2016 |     7 |
|     3 | 01/12/2016 |     7 |
|     3 | 01/01/2017 |     7 |
+-------+------------+-------+

Thanks in advance.

回答1:

You can use this:

library(dplyr)
library(zoo) # for the na.locf function
data %>% 
  group_by(ts_id) %>% # group by id
  mutate(value = na.locf(value,na.rm=F)) # na.locf fills with the last non-empty value

#head()
# # A tibble: 6 x 3
# # Groups:   ts_id [2]
# ts_id date       value
# <dbl> <date>     <dbl>
# 1     2 2014-10-01    18
# 2     2 2014-11-01    15
# 3     2 2014-12-01    15
# 4     2 2015-01-01    15
# 5     2 2015-02-01    15
# 6     3 2015-03-01    19


回答2:

Following the same logic with the na.locf, but keeping it in the 'verse, we can do,

library(tidyverse)

data %>% 
  group_by(ts_id) %>% 
  mutate(value = replace(value, is.nan(value), NA)) %>% 
  fill(value) 


标签: r dplyr