Extend an irregular sequence and add zeros to miss

2020-02-12 05:49发布

I have a data frame with a sequence in 'col1' and values in 'col2':

col1 col2
2     0.02
5     0.12
9     0.91
13    1.13

I want to expand the irregular sequence in 'col1' with a regular sequence from 1 to 13. For the values in 'col1' which are missing in the original data, I want 'col2' to have the value 0 in the final output:

col1  col2
1     0
2     0.02
3     0
4     0
5     0.12
6     0
7     0
8     0
9     0.91
10    0
11    0
12    0
13    1.13

How can I do this in R?

标签: r
9条回答
我只想做你的唯一
2楼-- · 2020-02-12 06:22

Here is a function that uses expandRows from splitstackshape package,

expand_seq <- function(x){
  x$new <- c(x$col1[1], diff(x$col1))
  new_df <- splitstackshape::expandRows(x, 'new')
  new_df$col1 <- seq(max(new_df$col1))
  new_df$col2[!new_df$col1 %in% x$col1] <- 0
  rownames(new_df) <- NULL
  return(new_df)
}

expand_seq(df)
#   col1 col2
#1     1 0.00
#2     2 0.02
#3     3 0.00
#4     4 0.00
#5     5 0.12
#6     6 0.00
#7     7 0.00
#8     8 0.00
#9     9 0.91
#10   10 0.00
#11   11 0.00
#12   12 0.00
#13   13 1.13
查看更多
beautiful°
3楼-- · 2020-02-12 06:26

Another way would be:

for (i in 1:max(test$col1)) {
  if(!(i %in% test$col1)) (test <- rbind(test, c(i, 0)))
}
test <- test[order(test$col1),]

Axeman's answer is really sweet, though.

Edit: Data used --

test <- structure(list(col1 = c(2, 5, 9, 13), col2 = c(0.02, 0.12, 0.91, 
1.13)), .Names = c("col1", "col2"), row.names = c(NA, -4L), class = "data.frame")

DISCLAIMER: This should really not be used for big datasets. I tried it with 1k rows and it was done in a heartbeat, but my second test with 100k rows is running for minutes now, which really emphasizes Axeman's concerns in his comment.

查看更多
别忘想泡老子
4楼-- · 2020-02-12 06:30

We can use base R with merge and replace

transform(merge(data.frame(col1= 1:13), df, all.x=TRUE),
                      col2 = replace(col2, is.na(col2), 0))
#    col1 col2
#1     1 0.00
#2     2 0.02
#3     3 0.00  
#4     4 0.00
#5     5 0.12
#6     6 0.00
#7     7 0.00
#8     8 0.00
#9     9 0.91
#10   10 0.00
#11   11 0.00
#12   12 0.00
#13   13 1.13
查看更多
贼婆χ
5楼-- · 2020-02-12 06:38

I didn't see a simple merge solution, so here is one:

res <- merge(data.frame(col1=1:max(df$col1)),df,by="col1",all.x=TRUE)
res$col2 <- ifelse(is.na(res$col2),0,res$col2)

The second line is replacing the NA's from the merge (left outer join) with zeros. As @Axeman points out, this can also be accomplished by:

res$col2[is.na(res$col2)] <- 0

The result is:

res
##   col1 col2
##1     1 0.00
##2     2 0.02
##3     3 0.00
##4     4 0.00
##5     5 0.12
##6     6 0.00
##7     7 0.00
##8     8 0.00
##9     9 0.91
##10   10 0.00
##11   11 0.00
##12   12 0.00
##13   13 1.13
查看更多
看我几分像从前
6楼-- · 2020-02-12 06:39
library(tidyr)

complete(d, col1 = 1:13, fill = list(col2 = 0))

or

complete(d, col1 = seq(max(col1))), fill = list(col2 = 0))
# A tibble: 13 × 2
    col1  col2
   <int> <dbl>
1      1  0.00
2      2  0.02
3      3  0.00
4      4  0.00
5      5  0.12
6      6  0.00
7      7  0.00
8      8  0.00
9      9  0.91
10    10  0.00
11    11  0.00
12    12  0.00
13    13  1.13

or

library(dplyr)

left_join(data.frame(col1 = seq(max(d$col1)))), d)

But this will leave NAs instead of zeros.

查看更多
Explosion°爆炸
7楼-- · 2020-02-12 06:39

There are already some interesting answers here.

Just to hop in, we can create a sequence of numbers from 1 to max(col1) and then get the respective value of col2 using match

col1 = seq(1, max(df$col1))
data.frame(col1, col2 = df$col2[match(col1, df$col1)])

#   col1 col2
#1     1   NA
#2     2 0.02
#3     3   NA
#4     4   NA
#5     5 0.12
#6     6   NA
#7     7   NA
#8     8   NA
#9     9 0.91
#10   10   NA
#11   11   NA
#12   12   NA 
#13   13 1.13

This will give NAs instead of 0. If we need 0's,

data.frame(col1,col2 = ifelse(is.na(match(col1,df$col1)), 0,
                                             df$col2[match(col1,df$col1)]))

#   col1 col2
#1     1 0.00
#2     2 0.02
#3     3 0.00
#4     4 0.00
#5     5 0.12
#6     6 0.00
#7     7 0.00
#8     8 0.00
#9     9 0.91
#10   10 0.00
#11   11 0.00
#12   12 0.00
#13   13 1.13
查看更多
登录 后发表回答