从tidyr的输出删除不完整的情况下 - 聚() - R的(Removing incomplete

2019-10-20 14:37发布

我有一个数据帧,看起来像这样乱糟糟的数据。

在这里,您可以在“团队”的一些足球队的名字看看。 Name1-3是可变的房源用来指代这些球队在第一列中的不同的名字。

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town                        

我的目标是使数据进入2列与队名1,队名2,团队NAME3配对。 我只是想保持这些配对那里是在1,名称或名称3数据。

要做到这一点,我想tidyr's- gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

这给出了以下的输出:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town                  

我试图除去不完全的情况下(例如行20,21,23,24而不是22),使用:

temp[complete.cases(temp),]

这不起作用作为看似空值观察包含文字“” -我想这是如何gather()返回丢失的数据? 我试着将temp$value的因素,但这也不能工作。

我很想听听如何摆脱不完整的情况。

样本数据...

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")

Answer 1:

您还可以添加filter (以去除空白),并select (为了除去key列)从dplyr封装得到的一切一气呵成

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United


Answer 2:

你是否在寻找: temp[temp$value!='',]gather不被指责为空字符串,你的初始数据有他们太多。 你可以替换他们首先然后用na.rm在参数gather

dat[dat==''] <- NA
temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE) 
temp$key<-NULL
tempA


Answer 3:

类似的方法,但利用na.omit的:

dat %>% 
  gather(key, value, -team) %>% 
  select(-key) %>%
  mutate(value = ifelse(value == "", NA, value)) %>%
  na.omit %>%
  arrange(team)


文章来源: Removing incomplete cases from output of tidyr - gather() - r