I wonder how to set up some example some fundamental matching procedures in R. There are many examples in various programming languages, but I have not yet found a good example for R.
Let’s say I want to match students to projects and I would consider 3 alternative approaches which I came across when googling on this issue:
1) Bipartite matching case: I ask each student to name 3 projects to work on (without stating any preference ranking among those 3).
ID T.1 T.2 T.3 T.4 T.5 T.6 T.7
1 1 1 1 0 0 0 0
2 0 0 0 0 1 1 1
3 0 1 1 1 0 0 0
4 0 0 0 1 1 1 0
5 1 0 1 0 1 0 0
6 0 1 0 0 0 1 1
7 0 1 1 0 1 0 0
--
d.1 <- structure(list(Student.ID = 1:7, Project.1 = c(1L, 0L, 0L, 0L,
1L, 0L, 0L), Project.2 = c(1L, 0L, 1L, 0L, 0L, 1L, 1L), Project.3 = c(1L,
0L, 1L, 0L, 1L, 0L, 1L), Project.4 = c(0L, 0L, 1L, 1L, 0L, 0L,
0L), Project.5 = c(0L, 1L, 0L, 1L, 1L, 0L, 1L), Project.6 = c(0L,
1L, 0L, 1L, 0L, 1L, 0L), Project.7 = c(0L, 1L, 0L, 0L, 0L, 1L,
0L)), .Names = c("Student.ID", "Project.1", "Project.2", "Project.3",
"Project.4", "Project.5", "Project.6", "Project.7"), class = "data.frame", row.names = c(NA,
-7L))
2) Hungarian algorithm: I ask each student name 3 projects to work on WITH stating a preference ranking among those 3. As far as I understood the reasoning when applying the algorithm in this case would be something like: the better the rank the lower the “costs” to the student.
ID T.1 T.2 T.3 T.4 T.5 T.6 T.7
1 3 2 1 na na na na
2 na na na na 1 2 3
3 na 1 3 2 na na na
4 na na na 1 2 3 na
5 2 na 3 na 1 na na
6 na 3 na na na 2 1
7 na 1 2 na 3 na na
--
d.2 <- structure(list(Student.ID = 1:7, Project.1 = structure(c(2L, 3L,
3L, 3L, 1L, 3L, 3L), .Label = c("2", "3", "na"), class = "factor"),
Project.2 = structure(c(2L, 4L, 1L, 4L, 4L, 3L, 1L), .Label = c("1",
"2", "3", "na"), class = "factor"), Project.3 = structure(c(1L,
4L, 3L, 4L, 3L, 4L, 2L), .Label = c("1", "2", "3", "na"), class = "factor"),
Project.4 = structure(c(3L, 3L, 2L, 1L, 3L, 3L, 3L), .Label = c("1",
"2", "na"), class = "factor"), Project.5 = structure(c(4L,
1L, 4L, 2L, 1L, 4L, 3L), .Label = c("1", "2", "3", "na"), class = "factor"),
Project.6 = structure(c(3L, 1L, 3L, 2L, 3L, 1L, 3L), .Label = c("2",
"3", "na"), class = "factor"), Project.7 = structure(c(3L,
2L, 3L, 3L, 3L, 1L, 3L), .Label = c("1", "3", "na"), class = "factor")), .Names = c("Student.ID",
"Project.1", "Project.2", "Project.3", "Project.4", "Project.5",
"Project.6", "Project.7"), class = "data.frame", row.names = c(NA,
-7L))
3) ??? approach: This should be pretty much related to (2). However, I think it is probably a better/ fairer approach (at least in the setting of the example). The students cannot pick projects, they even don’t know about the projects, but they have rate their qualifications (1 “not existent” to 10 “professional level”) with regards to a certain skillset. Further, the lecturer has rated the required skillset for every project. In addition to (2), a first step would be to calculate a similarity matrix and then to run the optimization routine from above.
PS: Programming Skills
SK: Statistical Knowledge
IE: Industry Experience
ID PS SK IE
1 10 9 8
2 1 2 10
3 10 2 5
4 2 5 3
5 10 2 10
6 1 10 1
7 5 5 5
--
d.3a <- structure(list(Student.ID = 1:7, Programming.Skills = c(10L, 1L,
10L, 2L, 10L, 1L, 5L), Statistical.knowlegde = c(9L, 2L, 2L,
5L, 2L, 10L, 5L), Industry.Expertise = c(8L, 10L, 5L, 3L, 10L,
1L, 5L)), .Names = c("Student.ID", "Programming.Skills", "Statistical.knowlegde",
"Industry.Expertise"), class = "data.frame", row.names = c(NA,
-7L))
--
T: Topic ID
PS: Programming Skills
SK: Statistical Knowledge
IE: Industry Experience
T PS SK IE
1 10 5 1
2 1 1 5
3 10 10 10
4 2 8 3
5 4 3 2
6 1 1 1
7 5 7 2
--
d.3b <- structure(list(Project.ID = 1:7, Programming.Skills = c(10L,
1L, 10L, 2L, 4L, 1L, 5L), Statistical.Knowlegde = c(5L, 1L, 10L,
8L, 3L, 1L, 7L), Industry.Expertise = c(1L, 5L, 10L, 3L, 2L,
1L, 2L)), .Names = c("Project.ID", "Programming.Skills", "Statistical.Knowlegde",
"Industry.Expertise"), class = "data.frame", row.names = c(NA,
-7L))
I would appreciate any help in implementing those 3 approaches in R. Thank you.
UPDATE: The following questions seem to be related, but none show how to solve it in R: https://math.stackexchange.com/questions/132829/group-membership-assignment-by-preferences-optimization-problem https://superuser.com/questions/467577/using-optimization-to-assign-by-preference