How to filter data in R or excel?

2019-09-02 09:47发布

I have a data that looks like below in excel 2007.

Class   Var1    Var2
1       20      27
2               13
2        
2       11   
2        
1       11      27
2               61
2        
2        
2        
1       20      27
2       30      71
2       
2       

All Class=2 are individuals that are a member of the household (Class=1) directly above it. From may example, this data belongs to one household

Class   Var1    Var2
1       20      27
2               13
2        
2       11   
2        

What I would like to do is remove the data in the individual level and replace it with value of the household level. So from my example, the output should be,

Class   Var1    Var2
1       20      27
2       20      27
2       20      27
2       20      27   
2       20      27  

Is there an easy way to do this in excel or in R? I have 200k+ rows of data so doing it manually would take me forever.

I am not sure how to remove the values in the individual level. But once that can be done, I can just use excels Go to > Special > Blanks functions.

Thanks Dixi

标签: r excel filter
2条回答
一夜七次
2楼-- · 2019-09-02 10:21

I'm reading in your data as under -

df <- read.table(textConnection(
"ClassObj   Var1    Var2
1       20      27
2       NA       13
2       NA  NA
2       11   NA
2        NA NA
1       11      27
2       NA      71
2       NA NA
2       NA NA
"), header = TRUE)

And then processing it as below -

library(data.table)
dt <- data.table(df)

#Flagging each group of 1 and successive 2s uniquely
dt[,flag := 0]
dt[ClassObj == 1,flag := 1]
dt[,flag := cumsum(flag)]

#Copying down the value of classobj = 1 to all other rows with the same flag
dt[,Var1 := .SD[ClassObj == 1, Var1], by = "flag"]
dt[,Var2 := .SD[ClassObj == 1, Var2], by = "flag"]

To get the output -

> dt
   ClassObj Var1 Var2 flag
1:        1   20   27    1
2:        2   20   27    1
3:        2   20   27    1
4:        2   20   27    1
5:        2   20   27    1
6:        1   11   27    2
7:        2   11   27    2
8:        2   11   27    2
9:        2   11   27    2
查看更多
再贱就再见
3楼-- · 2019-09-02 10:27

If you read the data into R data.frame say DF. Then you can just reset values of Var1 and Var2 to NA first then use na.locf function from package zoo

DF
##    Class Var1 Var2
## 1      1   20   27
## 2      2   NA   NA
## 3      2   NA   NA
## 4      2   NA   NA
## 5      2   NA   NA
## 6      1   11   27
## 7      2   NA   NA
## 8      2   NA   NA
## 9      2   NA   NA
## 10     2   NA   NA
## 11     1   20   27
## 12     2   NA   NA
## 13     2   NA   NA
## 14     2   NA   NA

DF[DF$Class == 2, 2:3] <- NA
require(zoo)
DF <- na.locf(DF)
DF
##    Class Var1 Var2
## 1      1   20   27
## 2      2   20   27
## 3      2   20   27
## 4      2   20   27
## 5      2   20   27
## 6      1   11   27
## 7      2   11   27
## 8      2   11   27
## 9      2   11   27
## 10     2   11   27
## 11     1   20   27
## 12     2   20   27
## 13     2   20   27
## 14     2   20   27
查看更多
登录 后发表回答