Removing incomplete cases from output of tidyr - g

I have untidy data in a dataframe that looks like this.

Here you can see in 'team' the names of some soccer teams. Name1-3 are variable listing the different names used to refer to these teams in the first column.

               team             name1        name2      name3
1      Loughborough      Loughborough                        
2        Luton Town        Luton Town        Luton           
3      Macclesfield      Macclesfield                        
4  Maidstone United  Maidstone United                        
5   Manchester City   Manchester City     Man City           
6 Manchester United Manchester United Newton Heath Man United
7    Mansfield Town    Mansfield Town    Mansfield           
8      Merthyr Town      Merthyr Town

My aim is to get the data into 2 columns with team-name1, team-name2, team-name3 pairings. I only want to keep those pairings where there is data in name1, name2 or name3.

To do this, I am trying tidyr's- gather()

temp <- dat %>% gather(key, value, 2:4) 
temp$key<-NULL
temp

This gives the following output:

                team             value
1       Loughborough      Loughborough
2         Luton Town        Luton Town
3       Macclesfield      Macclesfield
4   Maidstone United  Maidstone United
5    Manchester City   Manchester City
6  Manchester United Manchester United
7     Mansfield Town    Mansfield Town
8       Merthyr Town      Merthyr Town
9       Loughborough                  
10        Luton Town             Luton
11      Macclesfield                  
12  Maidstone United                  
13   Manchester City          Man City
14 Manchester United      Newton Heath
15    Mansfield Town         Mansfield
16      Merthyr Town                  
17      Loughborough                  
18        Luton Town                  
19      Macclesfield                  
20  Maidstone United                  
21   Manchester City                  
22 Manchester United        Man United
23    Mansfield Town                  
24      Merthyr Town

I tried to remove incomplete cases (e.g. rows 20,21, 23,24 but not 22), using:

temp[complete.cases(temp),]

This didn't work as the seemingly empty value observations contain a character "" - I guess this is how gather() returns missing data?. I tried converting temp$value to a factor but this didn't work either.

I'd love to hear how to get rid of the incomplete cases.

Sample data...

dat<-structure(list(team = structure(1:8, .Label = c("Loughborough", 
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City", 
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"), 
    name1 = structure(1:8, .Label = c("Loughborough", "Luton Town", 
    "Macclesfield", "Maidstone United", "Manchester City", "Manchester United", 
    "Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L, 
    2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City", 
    "Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team", 
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")

标签： r dplyr reshape2 tidyr

3条回答

ら.Afraid

2楼-- · 2019-08-05 07:10

You could also add filter (in order to remove blanks) and select (in order to remove key column) from dplyr package and get everything in one go

temp <- dat %>% 
  gather(key, value, 2:4) %>% 
  filter(value != "") %>%
  select(-key)

#                 team             value
# 1       Loughborough      Loughborough
# 2         Luton Town        Luton Town
# 3       Macclesfield      Macclesfield
# 4   Maidstone United  Maidstone United
# 5    Manchester City   Manchester City
# 6  Manchester United Manchester United
# 7     Mansfield Town    Mansfield Town
# 8       Merthyr Town      Merthyr Town
# 9         Luton Town             Luton
# 10   Manchester City          Man City
# 11 Manchester United      Newton Heath
# 12    Mansfield Town         Mansfield
# 13 Manchester United        Man United

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-08-05 07:22

similar approach, but making use of na.omit:

dat %>% 
  gather(key, value, -team) %>% 
  select(-key) %>%
  mutate(value = ifelse(value == "", NA, value)) %>%
  na.omit %>%
  arrange(team)

0人赞添加讨论(0) 举报

Bombasti

4楼-- · 2019-08-05 07:26

Are you looking for: temp[temp$value!='',]? gather isn't to be blamed for the empty strings, your initial data had them too. You could replace them first then use the na.rm argument in gather:

dat[dat==''] <- NA
temp <- dat %>% gather(key, value, 2:4, na.rm=TRUE) 
temp$key<-NULL
tempA

0人赞添加讨论(0) 举报

Removing incomplete cases from output of tidyr - g

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间