R: Recreate historical membership from a list of c

2019-08-17 08:36发布

问题:

Suppose I have the current membership status of a group, i.e. names of members. Additionally, I have data on times when some new member may have been added to the group and / or an old member may have been removed from the group.

The task at hand is to recreate the membership of the group at all these points in time. I've looked around but did not find a ready solution for this problem. Does anybody know an elegant method of doing this?

Reproducible example:

Input:

periods <- 5
indx <- paste0("t-", seq_len(periods))
[1] "t-1" "t-2" "t-3" "t-4" "t-5"

current <- letters[seq_len(10)]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

incoming <- setNames(letters[seq_len(periods) + 5], indx)
incoming[2] <- NA
t-1 t-2 t-3 t-4 t-5
"f"  NA "h" "i" "j"

outgoing <- setNames(letters[seq_len(periods) + 10], indx)
outgoing[4] <- NA
t-1 t-2 t-3 t-4 t-5
"k" "l" "m"  NA "o"

Output:

$current
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

$`t-1`
 [1] "a" "b" "c" "d" "e" "g" "h" "i" "j" "k"

$`t-2`
 [1] "a" "b" "c" "d" "e" "g" "h" "i" "j" "k" "l"

$`t-3`
 [1] "a" "b" "c" "d" "e" "g" "i" "j" "k" "l" "m"

$`t-4`
 [1] "a" "b" "c" "d" "e" "g" "j" "k" "l" "m"

$`t-5`
 [1] "a" "b" "c" "d" "e" "g" "k" "l" "m" "o"

Disclaimer: I've written a solution for this which I will be posting as my answer to the question. The intent is to document this problem and a possible solution and to elicit other ingenious and / or existing solutions or improvements.

回答1:

The function create_mem_ts (membership timeseries) will generate the desired output as posted in the question.

create_mem_ts <- function (ctime, added, removed, current) {

  # Create a time-series of membership of a set.

  # Inputs:

  ## ctime:     Time of changes in set.
  ##            An atomic vector of a time-series class or otherwise,
  ##
  ##            interpretable as a time-series in descending order (for e.g.
  ##            `t-1`, `t-2`, `t-3` etc.
  ##
  ##            Is an index of when the changes in membership happened in time.
  ##            Allows repeats but no NAs.

  ## added:     Member(s) added to the set.
  ##            An atomic vector or a list of the same length as ctime.
  ##
  ##            If an atomic vector, represents exactly one member added at
  ##            the corresponding ctime.
  ##
  ##            If a list, represents multiple members added at corresponding
  ##            ctime.

  ## removed:   Member(s) removed from the set.
  ##            An atomic vector or a list of the same length as ctime.
  ##
  ##            If an atomic vector, represents exactly one member removed at
  ##            the corresponding ctime.
  ##
  ##            If a list, represents multiple members removed at the
  ##            corresponding ctime.

  ## current:   Current membership of the set.
  ##            An atomic vector listing the current membership of the set.

  # Output:

  ## A list of the same length as ctime named by values in ctime (coerced to
  ## character by the appropriate method). 

  stopifnot(is.atomic(ctime),
            is.atomic(added) || is.list(added),
            is.atomic(removed) || is.list(removed))

  if (any(is.na(ctime))) stop("NAs not allowed in the ctime.")

  stopifnot(length(ctime) == length(added),
            length(added) == length(removed))

  if (any(duplicated(ctime))) {
    ctime.u <- unique(ctime)
    ctime.f <- factor(ctime, levels=as.character(ctime.u))
    added <- split(added, ctime.f)
    removed <- split(removed, ctime.f)
  } else {
    ctime.u <- ctime
  }

  out <- setNames(vector(mode="list", length=length(ctime.u) + 1),
                  c("current", as.character(ctime.u)))
  out[["current"]] <- current

  for (i in 2:length(out))
    out[[i]] <- union(setdiff(out[[i - 1]], added[[i - 1]]),
                      na.omit(removed[[i - 1]]))

  attr(out, "index") <- ctime.u

  out

}

Moreover, if ctime is a valid time-series class in the function above, the output from that can be used to generate membership on any time-stamp using the function (within the range in ctime) using this function memship_at.

memship_at <- function (mem_ts, at) {

  stopifnot(inherits(at, class(attr(mem_ts, "index"))))

  just.before <- which(at > attr(mem_ts, "index"))[1]

  if (just.before > 1)
    mem_ts[[just.before - 1]]
  else
    mem_ts[[1]]

}


标签: r set