Separate columns with constant numbers and condens

I have a data.frame called d. In this data.frame, some columns consist of constant numbers across the rows of the first column: study.name (see below).

For example, columns ESL, ESL.1, prof, and prof.1 are constant numbers for all rows of Shin.Ellis and also constant for all rows of Trus.Hsu and so on.

Q: In BASE R, how can I separate such constant variables, and then condense them to one row with only one number?

My desired output is shown further below. A functional answer is appreciated.

d <- read.csv("https://raw.githubusercontent.com/izeh/m/master/irr.csv", h = T)[-(2:3)]

## FIRST 8 ROWS:

#    study.name ESL prof scope type ESL.1 prof.1 scope.1 type.1
# 1  Shin.Ellis   1    2     1    1     1      2       1      1
# 2  Shin.Ellis   1    2     1    1     1      2       1      1
# 3  Shin.Ellis   1    2     1    2     1      2       1      1
# 4  Shin.Ellis   1    2     1    2     1      2       1      1
# 5  Shin.Ellis   1    2    NA   NA     1      2      NA     NA
# 6  Shin.Ellis   1    2    NA   NA     1      2      NA     NA
# 7    Trus.Hsu   2    2     2    1     2      2       1      1
# 8    Trus.Hsu   2    2    NA   NA     2      2      NA     NA

Desired output:

#    study.name ESL prof  ESL.1 prof.1 
# 1  Shin.Ellis   1    2      1      2  
# 2  Trus.Hsu     2    2      2      2
# .     .         .    .      .      . # AND SO ON !!!

标签： r function loops dataframe

4条回答

爷、活的狠高调

2楼-- · 2020-02-13 04:23

You could try something like this, though it feels a bit clumsy. Basically, check which columns have constant values by group for all groups, keep only those columns, and then keep only unique values (since now they are constant by group).

d <- read.table(header = TRUE,
text = "study.name ESL prof scope type ESL.1 prof.1 scope.1 type.1
Shin.Ellis   1    2     1    1     1      2       1      1
Shin.Ellis   1    2     1    1     1      2       1      1
Shin.Ellis   1    2     1    2     1      2       1      1
Shin.Ellis   1    2     1    2     1      2       1      1
Shin.Ellis   1    2    NA   NA     1      2      NA     NA
Shin.Ellis   1    2    NA   NA     1      2      NA     NA
  Trus.Hsu   2    2     2    1     2      2       1      1
  Trus.Hsu   2    2    NA   NA     2      2      NA     NA")

is_constant <- function(x) length(unique(x)) == 1L

keep_constants <- function(df, group_col) {
  data_cols <- colnames(df)[setdiff(1:ncol(df), group_col)]
  check_df <- aggregate(df, by = list(df[[group_col]]), FUN = is_constant)
  cols_to_keep <- sapply(check_df[, -1], all)
  unique(df[, cols_to_keep])
}

keep_constants(d, 1)
#>   study.name ESL prof ESL.1 prof.1
#> 1 Shin.Ellis   1    2     1      2
#> 7   Trus.Hsu   2    2     2      2

^{Created on 2019-10-09 by the reprex package (v0.3.0)}

0人赞添加讨论(0) 举报

虎瘦雄心在

3楼-- · 2020-02-13 04:33

If you just want to remove repeated values across all columns unique() is base R

unique(d)

EDIT - Thanks for the clarification @CalumYou - I think this is what OP is looking for in base R.

is_constant = lapply(split(d, d$study.name), function(data){
  unlist(lapply(data,function(col){
    length(unique(col)) == 1
  }))
})
is_constant = as.data.frame(do.call(rbind, is_constant))
all_constant = d[,unlist(lapply(is_constant,all))]
all_constant = unique(all_constant)

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

4楼-- · 2020-02-13 04:39

May be we need

library(dplyr)
d %>%
   group_by(study.name) %>%
   slice(1)

Or in base R after grouping by 'study.name', get the first row while specifying the na.action = NULL as the default option is na.omit which can omit any row having NA in any of the columns

aggregate(.~ study.name, d, head, 1, na.action = NULL)

If we want to subset the columns

nm1 <- names(which(!colSums(!do.call(rbind, by(d[-1], d$study.name,
     FUN = function(x) lengths(sapply(x, unique)) == 1)))))
unique(d[c("study.name", nm1)])

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

5楼-- · 2020-02-13 04:46

d_list <- lapply(split(d,d$study.name), 
                 #Find columns with similar values using sapply and length(unique(cols)) 
                 #then get the 1st row
                 function(x) x[1, sapply(x,function(y) length(unique(y))==1)])
do.call('rbind.data.frame',d_list)

           study.name ESL prof ESL.1 prof.1
Bit.KnoA     Bit.KnoA   1    3     1      3
Bit.KnoB     Bit.KnoB   1    2     1      2
ChandlerA   ChandlerA   1    2     1      2
Mubarak       Mubarak   2   NA     2     NA
SheenA         SheenA   1    2     1      2
Shin.Ellis Shin.Ellis   1    2     1      2
Sun               Sun   2    2     2      2
Trus.Hsu     Trus.Hsu   2    2     2      2

0人赞添加讨论(0) 举报

Separate columns with constant numbers and condens

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间