Separate values in a column of a dataframe and mel

2020-04-20 05:48发布

问题:

I have a data frame where I want to separate values in the Client.ID column and melt, so each row contains one Client.ID and and the corresponding Account.Name and owner.

> head(df)
  Account.Owner       Account.Name                   Client.ID
1    Deb Berman     Albertsons LLC      3184, 3186, 3185, 2578
2    Deb Berman        All Recipes                   909, 4937
3    Liz Madsen   American Express                   1230,1236
4    Deb Berman  Bed Bath & Beyond                  1180, 1556
5    Deb Berman           Birchbox 101, 1704, 5149, 5150, 5148
6   Jeff Murphy Brown Shoe Company            5402, 6159, 6160

At the end I want it to look like so

Account.Owner       Account.Name                   Client.ID
    1    Deb Berman     Albertsons LLC                  3184  
    2    Deb Berman     Albertsons LLC                  3186
    3    Deb Berman     Albertsons LLC                  3185

Thanks.

回答1:

I would suggest my cSplit function for a problem like this. The solution becomes:

cSplit(mydf, "Client.ID", ",", "long")
#     Account.Owner       Account.Name Client.ID
#  1:    Deb Berman     Albertsons LLC      3184
#  2:    Deb Berman     Albertsons LLC      3186
#  3:    Deb Berman     Albertsons LLC      3185
#  4:    Deb Berman     Albertsons LLC      2578
#  5:    Deb Berman        All Recipes       909
#  6:    Deb Berman        All Recipes      4937
#  7:    Liz Madsen   American Express      1230
#  8:    Liz Madsen   American Express      1236
#  9:    Deb Berman  Bed Bath & Beyond      1180
# 10:    Deb Berman  Bed Bath & Beyond      1556
# 11:    Deb Berman           Birchbox       101
# 12:    Deb Berman           Birchbox      1704
# 13:    Deb Berman           Birchbox      5149
# 14:    Deb Berman           Birchbox      5150
# 15:    Deb Berman           Birchbox      5148
# 16:   Jeff Murphy Brown Shoe Company      5402
# 17:   Jeff Murphy Brown Shoe Company      6159
# 18:   Jeff Murphy Brown Shoe Company      6160

The arguments used here are: (1) the data.frame or data.table to work with, (2) the column or columns that need to be split up, (3) the separator, and (4) whether the result should be "wide" or "long".

You may want to modify it as follows: cSplit(mydf, "Client.ID", ", ", "long") or cSplit(mydf, "Client.ID", ",|,\\s+", "long", fixed = FALSE) depending on how nice and clean your "Client.ID" column really is.

This assumes we're starting with the following sample dataset:

mydf <- structure(list(Account.Owner = c("Deb Berman", "Deb Berman", 
   "Liz Madsen", "Deb Berman", "Deb Berman", "Jeff Murphy"), 
   Account.Name = c("Albertsons LLC", "All Recipes", "American Express", 
   "Bed Bath & Beyond", "Birchbox", "Brown Shoe Company"), 
   Client.ID = c("3184, 3186, 3185, 2578", "909, 4937", "1230,1236", 
   "1180, 1556", "101, 1704, 5149, 5150, 5148", "5402, 6159, 6160")), 
   .Names = c("Account.Owner", "Account.Name", "Client.ID"), 
   class = c("data.table", "data.frame"), row.names = c(NA, -6L))