I have untidy data in a dataframe that looks like this.
Here you can see in 'team' the names of some soccer teams. Name1-3 are variable listing the different names used to refer to these teams in the first column.
team name1 name2 name3
1 Loughborough Loughborough
2 Luton Town Luton Town Luton
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City Man City
6 Manchester United Manchester United Newton Heath Man United
7 Mansfield Town Mansfield Town Mansfield
8 Merthyr Town Merthyr Town
My aim is to get the data into 2 columns with team-name1, team-name2, team-name3 pairings. I only want to keep those pairings where there is data in name1, name2 or name3.
To do this, I am trying tidyr's- gather()
temp <- dat %>% gather(key, value, 2:4)
temp$key<-NULL
temp
This gives the following output:
team value
1 Loughborough Loughborough
2 Luton Town Luton Town
3 Macclesfield Macclesfield
4 Maidstone United Maidstone United
5 Manchester City Manchester City
6 Manchester United Manchester United
7 Mansfield Town Mansfield Town
8 Merthyr Town Merthyr Town
9 Loughborough
10 Luton Town Luton
11 Macclesfield
12 Maidstone United
13 Manchester City Man City
14 Manchester United Newton Heath
15 Mansfield Town Mansfield
16 Merthyr Town
17 Loughborough
18 Luton Town
19 Macclesfield
20 Maidstone United
21 Manchester City
22 Manchester United Man United
23 Mansfield Town
24 Merthyr Town
I tried to remove incomplete cases (e.g. rows 20,21, 23,24 but not 22), using:
temp[complete.cases(temp),]
This didn't work as the seemingly empty value observations contain a character "" - I guess this is how gather()
returns missing data?. I tried converting temp$value
to a factor but this didn't work either.
I'd love to hear how to get rid of the incomplete cases.
Sample data...
dat<-structure(list(team = structure(1:8, .Label = c("Loughborough",
"Luton Town", "Macclesfield", "Maidstone United", "Manchester City",
"Manchester United", "Mansfield Town", "Merthyr Town"), class = "factor"),
name1 = structure(1:8, .Label = c("Loughborough", "Luton Town",
"Macclesfield", "Maidstone United", "Manchester City", "Manchester United",
"Mansfield Town", "Merthyr Town"), class = "factor"), name2 = structure(c(1L,
2L, 1L, 1L, 3L, 5L, 4L, 1L), .Label = c("", "Luton", "Man City",
"Mansfield", "Newton Heath"), class = "factor"), name3 = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L), .Label = c("", "Man United"), class = "factor")), .Names = c("team",
"name1", "name2", "name3"), row.names = c(NA, -8L), class = "data.frame")
You could also add
filter
(in order to remove blanks) andselect
(in order to removekey
column) fromdplyr
package and get everything in one gosimilar approach, but making use of na.omit:
Are you looking for:
temp[temp$value!='',]
?gather
isn't to be blamed for the empty strings, your initial data had them too. You could replace them first then use thena.rm
argument ingather
: