可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a data frame that has 2 columns.
column1 has random numbers in
column2 is a place holding column for what i want column3 to look like
random temp
0.502423373 1
0.687594055 0
0.741883739 0
0.445364032 0
0.50626137 0.5
0.516364981 0
...
I want to fill column3 so it takes the last non-zero number (1 or .5 in this example) and continuously fills the following rows with that value until it hits a row with a different number. then it repeats the process for the entire column.
random temp state
0.502423373 1 1
0.687594055 0 1
0.741883739 0 1
0.445364032 0 1
0.50626137 0.5 0.5
0.516364981 0 0.5
0.807804708 0 0.5
0.247948445 0 0.5
0.46573337 0 0.5
0.103705154 0 0.5
0.079625868 1 1
0.938928944 0 1
0.677713019 0 1
0.112231619 0 1
0.165907178 0 1
0.836195267 0 1
0.387712998 1 1
0.147737077 0 1
0.439281543 0.5 0.5
0.089013503 0 0.5
0.84174743 0 0.5
0.931738707 0 0.5
0.807955172 1 1
thanks for any and all help
回答1:
Perhaps you can make use of na.locf
from the "zoo" package after setting values of "0" to NA
. Assuming your data.frame
is called "mydf":
mydf$state <- mydf$temp
mydf$state[mydf$state == 0] <- NA
library(zoo)
mydf$state <- na.locf(mydf$state)
# random temp state
# 1 0.5024234 1.0 1.0
# 2 0.6875941 0.0 1.0
# 3 0.7418837 0.0 1.0
# 4 0.4453640 0.0 1.0
# 5 0.5062614 0.5 0.5
# 6 0.5163650 0.0 0.5
If there were NA
values in your original data.frame
in the "temp" column, and you wanted to keep them as NA
in the newly generated "state" column too, that's easy to take care of. Just add one more line to reintroduce the NA
values:
mydf$state[is.na(mydf$temp)] <- NA
回答2:
Inspired by the solution of @Ananda Mahto, this is an adaption of the internal code of na.locf
that works directly with 0
's instead of NA
s. Then you don't need the zoo
package and you don't need to do the preprocessing of changing the values to NA
. Benchmarktests show that this is about 10 times faster than the original version.
locf.0 <- function(x) {
L <- x!=0
idx <- c(0, which(L))[cumsum(L) + 1]
return(x[idx])
}
mydf$state <- locf.0(mydf$temp)
回答3:
Here is an interesting way with the Reduce
function.
temp = c(1,0,0,0,.5,0,0,0,0,0,1,0,0,0,0,0,1,0,0.5,0,0,0,1)
fill_zero = function(x,y) if(y==0) x else y
state = Reduce(fill_zero, temp, accumulate=TRUE)
If you're worried about speed, you can try Rcpp.
library(Rcpp)
cppFunction('
NumericVector fill_zeros( NumericVector x ) {
for( int i=1; i<x.size(); i++ )
if( x[i]==0 ) x[i] = x[i-1];
return x;
}
')
state = fill_zeros(temp)
回答4:
Also, unless I'm overlooking something, this seems to work:
DF$state2 <- ave(DF$temp, cumsum(DF$temp), FUN = function(x) x[x != 0])
DF
# random temp state state2
#1 0.50242337 1.0 1.0 1.0
#2 0.68759406 0.0 1.0 1.0
#3 0.74188374 0.0 1.0 1.0
#4 0.44536403 0.0 1.0 1.0
#5 0.50626137 0.5 0.5 0.5
#6 0.51636498 0.0 0.5 0.5
#7 0.80780471 0.0 0.5 0.5
#8 0.24794844 0.0 0.5 0.5
#9 0.46573337 0.0 0.5 0.5
#10 0.10370515 0.0 0.5 0.5
#11 0.07962587 1.0 1.0 1.0
#12 0.93892894 0.0 1.0 1.0
#13 0.67771302 0.0 1.0 1.0
#14 0.11223162 0.0 1.0 1.0
#15 0.16590718 0.0 1.0 1.0
#16 0.83619527 0.0 1.0 1.0
#17 0.38771300 1.0 1.0 1.0
#18 0.14773708 0.0 1.0 1.0
#19 0.43928154 0.5 0.5 0.5
#20 0.08901350 0.0 0.5 0.5
#21 0.84174743 0.0 0.5 0.5
#22 0.93173871 0.0 0.5 0.5
#23 0.80795517 1.0 1.0 1.0
回答5:
A loop along the following lines should do the trick for you -
for(i in seq(nrow(df)))
{
if (df[i,"v1"] == 0) df[i,"v1"] <- df[i-1,"v1"]
}
Output -
> df
v1 somedata
1 1 33
2 2 24
3 1 36
4 0 49
5 2 89
6 2 48
7 0 4
8 1 98
9 1 60
10 2 76
>
> for(i in seq(nrow(df)))
+ {
+ if (df[i,"v1"] == 0) df[i,"v1"] <- df[i-1,"v1"]
+ }
> df
v1 somedata
1 1 33
2 2 24
3 1 36
4 1 49
5 2 89
6 2 48
7 2 4
8 1 98
9 1 60
10 2 76
回答6:
I suggest using the run length encoding functions, it's a natural way for dealing with steaks in a data set. Using @Kevin's example vector:
temp = c(1,0,0,0,.5,0,0,0,0,0,1,0,0,0,0,0,1,0,0.5,0,0,0,1)
y <- rle(temp)
#str(y)
#List of 2
# $ lengths: int [1:11] 1 3 1 5 1 5 1 1 1 3 ...
# $ values : num [1:11] 1 0 0.5 0 1 0 1 0 0.5 0 ...
# - attr(*, "class")= chr "rle"
for( i in seq(y$values)[-1] ) {
if(y$values[i] == 0) {
y$lengths[i-1] = y$lengths[i] + y$lengths[i-1]
y$lengths[i] = 0
}
}
#str(y)
#List of 2
# $ lengths: num [1:11] 4 0 6 0 6 0 2 0 4 0 ...
# $ values : num [1:11] 1 0 0.5 0 1 0 1 0 0.5 0 ...
# - attr(*, "class")= chr "rle"
inverse.rle(y)
# [1] 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.5
# [20] 0.5 0.5 0.5 1.0
回答7:
Simply use a loop with a global variable ,
globalvariable used here is m
, r
is a dataframe with two columns A
and B
.
r$B = c(1,NA, NA, NA, 3, NA,6)
m=1
for( i in 1:nrow(r) ){
if(is.na(r$B[i])==FALSE ){
m <<- i # please note the assign sign , " <<- "
next()
} else {
r$B[i] = r$B[m]
}
}
After Execution :
r$B = 1 1 1 1 3 3 6