Extend an irregular sequence and add zeros to miss

2020-02-12 05:49发布

I have a data frame with a sequence in 'col1' and values in 'col2':

col1 col2
2     0.02
5     0.12
9     0.91
13    1.13

I want to expand the irregular sequence in 'col1' with a regular sequence from 1 to 13. For the values in 'col1' which are missing in the original data, I want 'col2' to have the value 0 in the final output:

col1  col2
1     0
2     0.02
3     0
4     0
5     0.12
6     0
7     0
8     0
9     0.91
10    0
11    0
12    0
13    1.13

How can I do this in R?

标签: r
9条回答
欢心
2楼-- · 2020-02-12 06:43

Another way would be the following. Your data is called mydf here. You create a data frame with a column including 1 to the max value of col1. Then, you use assign the values of col2 in mydf to a new column called col2 in foo. You use the numbers in col1 in mydf as index when you do this process. By this time, you have NA in col2 in foo. You want to change NA to 0. So the final step is to do this. You look for NA's position in col2 in foo using is.na() and assign zeros to the positions.

foo <- data.frame(col1 = 1:max(mydf$col1))
foo$col2[mydf$col1] <- mydf$col2
foo$col2[is.na(foo$col2)] <- 0

Taking lmo's idea into an account, you can create a data frame with 0 first and avoid the 3rd step.

foo <- data.frame(col1 = 1:max(mydf$col1), col2 = 0)
foo$col2[mydf$col1] <- mydf$col2


#   col1 col2
#1     1 0.00
#2     2 0.02
#3     3 0.00
#4     4 0.00
#5     5 0.12
#6     6 0.00
#7     7 0.00
#8     8 0.00
#9     9 0.91
#10   10 0.00
#11   11 0.00
#12   12 0.00
#13   13 1.13

DATA

mydf <- structure(list(col1 = c(2L, 5L, 9L, 13L), col2 = c(0.02, 0.12, 
0.91, 1.13)), .Names = c("col1", "col2"), class = "data.frame", row.names = c(NA, 
-4L))
查看更多
▲ chillily
3楼-- · 2020-02-12 06:45

Just for completeness, a self binary join using data.table (you will get NAs instead of zeroes, but that could be easily changed if needed)

library(data.table)
setDT(df)[.(seq(max(col1))), on = .(col1)]
#     col1 col2
#  1:    1   NA
#  2:    2 0.02
#  3:    3   NA
#  4:    4   NA
#  5:    5 0.12
#  6:    6   NA
#  7:    7   NA
#  8:    8   NA
#  9:    9 0.91
# 10:   10   NA
# 11:   11   NA
# 12:   12   NA
# 13:   13 1.13
查看更多
够拽才男人
4楼-- · 2020-02-12 06:45

Just to add a different point of view, consider that what you have can be seen as a sparse vector, i.e. a vector whose only the non-zero values are defined. Sparse vectors are implemented by the Matrix package in R. If df is your initial data.frame, try:

require(Matrix)
data.frame(col1=seq_len(max(df$col1)),
      col2=as.vector(sparseVector(df$col2,df$col1,max(df$col1))))
#   col1 col2
#1     1 0.00
#2     2 0.02
#3     3 0.00
#4     4 0.00
#5     5 0.12
#6     6 0.00
#7     7 0.00
#8     8 0.00
#9     9 0.91
#10   10 0.00
#11   11 0.00
#12   12 0.00
#13   13 1.13

The same result in a one-liner base R:

data.frame(col1=seq_len(max(df$col1)),
   col2=`[<-`(numeric(max(df$col1)),df$col1,df$col2))
查看更多
登录 后发表回答