I want to generate 8 combinations of names from a

2019-07-11 13:14发布

问题:

I have a data frame with 20 players from 4 different teams (5 players per team), each assigned a salary from a fantasy draft. I would like to be able to create all combinations of 8 players whose salaries are equal to or less than 10000 & whose total points are greater than x but excluding any combinations that contains 4 or more players from the same team.

Here is what my data frame looks like:

       Team      Player    K   D    A    LH Points Salary    PPS
  4     ATN  ExoticDeer  6.1 3.3  6.4 306.9 22.209   1622 1.3692
  2     ATN     Supreme  6.8 5.3  7.1 229.4 21.954   1578 1.3913
  1     ATN        sasu  3.6 6.4 11.0  95.7 19.357   1244 1.5560
  3     ATN eL lisasH 2  2.6 6.1  7.9  29.7 12.037    998 1.2061
  5     ATN       Nisha  2.7 5.6  7.5  48.2 12.282    955 1.2861
  11     CL Swiftending  6.0 5.8  7.8 360.5 22.285   1606 1.3876
  13     CL     Pajkatt 13.3 7.5  9.3 326.8 37.248   1489 2.5015
  15     CL  SexyBamboe  6.3 8.5  9.3 168.0 20.660   1256 1.6449
  14     CL         EGM  2.8 6.0 13.5  78.8 21.988    989 2.2233
  12     CL       Saksa  2.5 6.5 10.5  59.8 15.898    967 1.6441
  51 DBEARS         Ace  7.0 3.4  6.9 195.6 23.596   1578 1.4953
  31 DBEARS    HesteJoe  5.4 5.4  6.1 176.7 16.927   1512 1.1195
  61 DBEARS      Miggel  2.8 6.8 11.0 141.8 17.818   1212 1.4701
  21 DBEARS        Noia  3.0 6.0  8.0  36.1 13.161    970 1.3568
  41 DBEARS        Ryze  2.7 4.7  6.7  74.6 12.166    937 1.2984
  8      GB Keyser Soze  6.0 5.0  5.6 316.0 19.120   1602 1.1935
  9      GB      Madara  5.4 5.3  6.6 334.5 19.405   1577 1.2305
  10     GB     SkyLark  1.8 5.3  7.0  71.8 10.218   1266 0.8071
  7      GB         MNT  2.3 5.9  6.1  85.6  9.316   1007 0.9251
  6      GB   SKANKS224  1.4 7.6  7.4  52.5  7.565    954 0.7930

I am following the general concept described in this post: I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less

tweaking the code to suit my needs. This is what I have so far:

## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn,     8))
## convert the names to a string, 
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
    if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)

Using the above code I am able to generate all possibly lineups of 8 players and then subset that by various criteria (total salary and number of points), but I am struggling when it comes to excluding the lineups where there are more than 3 players from the same team.

I imagine the lineups would need to be excluded from newdf but I don't really know where to begin in doing that.

Here are the dput results:

structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL", 
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS", 
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L, 
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L, 
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer", 
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224", 
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending", 
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3, 
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3, 
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6, 
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8, 
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1, 
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168, 
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8, 
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282, 
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818, 
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622, 
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512, 
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692, 
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233, 
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305, 
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D", 
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4", 
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31", 
"61", "21", "41", "8", "9", "10", "7", "6"))

回答1:

Here's one way:

splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]

First split the Player column by comma. Then match the player names to the Fantasy Players player name column. With those indices, we can do the main work which is any(table(FantasyPlayers$Team[x]) > 3). This is the check of team counts that exceed three, which will indicate 3 or more players from the same team.



回答2:

Best to construct this in long form, I think:

Construct teams

library(data.table)
setDT(FantasyPlayers)

xx    <- combn(as.character(FantasyPlayers$Player), 8)
mxx   <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))

head(mxx,10)
#     jersey_no team_no      Player
#  1:         1       1  ExoticDeer
#  2:         2       1     Supreme
#  3:         3       1        sasu
#  4:         4       1 eL lisasH 2
#  5:         5       1       Nisha
#  6:         6       1 Swiftending
#  7:         7       1     Pajkatt
#  8:         8       1  SexyBamboe
#  9:         1       2  ExoticDeer
# 10:         2       2     Supreme

Groups of 8 players share a team_no and are indexed by their jersey_no. Look at ?melt.array to see how this works. setDT just converts the resulting data.frame to a data.table for easier merging.

Merge to recover Player attributes

FantasyTeams <- FantasyPlayers[mxx, on="Player"]

#          Team      Player   K   D    A    LH Points Salary    PPS jersey_no team_no
#       1:  ATN  ExoticDeer 6.1 3.3  6.4 306.9 22.209   1622 1.3692         1       1
#       2:  ATN     Supreme 6.8 5.3  7.1 229.4 21.954   1578 1.3913         2       1
#       3:  ATN        sasu 3.6 6.4 11.0  95.7 19.357   1244 1.5560         3       1
#       4:  ATN eL lisasH 2 2.6 6.1  7.9  29.7 12.037    998 1.2061         4       1
#       5:  ATN       Nisha 2.7 5.6  7.5  48.2 12.282    955 1.2861         5       1
#      ---                                                                           
# 1007756:   GB Keyser Soze 6.0 5.0  5.6 316.0 19.120   1602 1.1935         4  125970
# 1007757:   GB      Madara 5.4 5.3  6.6 334.5 19.405   1577 1.2305         5  125970
# 1007758:   GB     SkyLark 1.8 5.3  7.0  71.8 10.218   1266 0.8071         6  125970
# 1007759:   GB         MNT 2.3 5.9  6.1  85.6  9.316   1007 0.9251         7  125970
# 1007760:   GB   SKANKS224 1.4 7.6  7.4  52.5  7.565    954 0.7930         8  125970

By default, only the first and last several rows of a data.table are printed. To examine the whole thing, try ?View or look at the arguments to ?print.data.table.

Filter to a set of teams with chosen features

To filter to those team_no having no more than three players from the same Team...

my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no

V1 is the default name assigned to the constructed variable max(table(Team)) <= 3. This is not lightning fast, but now that you have excluded some teams, later subsetting steps should be faster:

my_new_teams <- 
  FantasyTeams[team_no %in% my_teams, sum(Salary) < 10000, by=team_no][V1==TRUE]$team_no

To save a few key strokes and microseconds, substitute (V1) for V1==TRUE. It's the idiomatic way.

Recovering the roster from a set of teams

To get the roster associated with each team, join/merge with mxx

mxx[.(team_no = my_new_teams), on="team_no"]

If you want the players listed on a single line, as in the OP:

mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]

If you want aggregate statistics for each team, you'll instead need to join with FantasyTeams:

FantasyTeams[.(team_no = my_new_teams), .(
  roster     = toString(Player),
  tot_salary = sum(Salary),
  tot_points = sum(Points)
), on="team_no", by=.EACHI]

#        team_no                                                              roster tot_salary tot_points
#     1:    3716      ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze       9913    149.018
#     2:    3720       ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT       9983    146.168
#     3:    3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224       9930    144.417
#     4:    3725       ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT       9950    145.173
#     5:    3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224       9897    143.422
#    ---                                                                                                  
# 40202:  125663         EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224       8638    117.032
# 40203:  125664                EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT       8925    119.970
# 40204:  125665          EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224       8872    118.219
# 40205:  125666              EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224       8613    117.317
# 40206:  125667             EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224       8302    108.130

To understand what by=.EACHI is doing, a little background is needed. The merge syntax here is DT[i, j, on=cols, by=.EACHI].

  • If j and by are left out, it just does the merge, as in the construction of FantasyTeams.
  • If by is left out, but j is included, j is computed after the merge.
  • If by=.EACHI, then j is computed separately for each value in i.