合并方案通过R基团来代替中位数(Combining scenario to Replace Medi

2019-11-05 04:54发布

我有数据集

mydat <- 
structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("52382MCK", 
"52499MCK"), class = "factor"), item = c(11709L, 11709L, 11709L, 
11709L, 11708L, 11708L, 11708L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11202L, 11203L, 11203L, 11204L, 11204L, 11205L, 11205L
), sales = c(30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 
15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 
20L, 15L, 2L, 10L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), action = c(0L, 
1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
1L, 1L)), row.names = c(NA, -35L), class = "data.frame")
# coerce to data.table
setDT(mydat)

与此数据集,几个动作。

1. selecting scenario by groups.

因此,有行动列。 它只能有两个值零(0)或一(1)。

该方案是零个类别动作的行动之前的第一类别的数目和零的动作中的一个类别后的位数。

For example
52382МСК    11709

它是情景时,我们有1零类动作山坳。 行动之前山坳第一类和动作山坳第一类后两个零。 注:也许情况下,当我们有2个零类动作山坳。 前动作栏的第一类别,和动作栏的第一类后1为零。

mydat1

code    item    sales   action
52382МСК    11709   30  0
52382МСК    11709   10  1
52382МСК    11709   20  0
52382МСК    11709   15  0

检测到这种情况下我使用这个脚本/这个脚本很好的作品,感谢@Uwe

library(data.table)
library(magrittr)

max_zeros <- 3
zeros <- sapply(0:max_zeros, stringr::str_dup, string = "0")
names(zeros) <- as.character(nchar(zeros))
sc <- CJ(zeros.before = zeros, zeros.after = zeros)[
  , scenario.name := paste(nchar(zeros.before), nchar(zeros.after), sep = "-")][
    , action.pattern := sprintf("%s1+(?=%s)", zeros.before, zeros.after)][]
# special case: all zero
sc0 <- data.table(
  zeros.before = NA,
  zeros.after = NA, 
  scenario.name = "no1", 
  action.pattern = "^0+$")
sc <- rbind(sc0, sc)

接着

setDT(mydat)
class <- mydat[, .(scenario.name = sc$scenario.name[
  paste(action, collapse = "") %>% 
    stringr::str_count(sc$action.pattern) %>%
    is_greater_than(0) %>% 
    which() %>% 
    max()
  ]),
  by = .(code, item)][]

class
mydat[class, on = .(code, item)]

所以我得到带班情况的数据。

2.operation it is replace median.

对于由零类别中的每个场景中位数计算。

我需要1个前面的零通过的行动列类别,即之前的行动列的一个类别,它去计算中位数和2个0操作列中的一个类别后走了。 只有以出售列操作列的第一类更换执行中位数。 如果中位数低于销量,那么就不要更换。

要做到这一点我使用的脚本

sales_action <- function(DF, zeros_before, zeros_after) {
  library(data.table)
  library(magrittr)
  action_pattern <- 
    do.call(sprintf, 
            c(fmt = "%s1+(?=%s)", 
              stringr::str_dup("0", c(zeros_before, zeros_after)) %>% as.list()
            ))
  message("Action pattern used: ", action_pattern)
  setDT(DF)[, rn := .I]
  tmp <- DF[, paste(action, collapse = "") %>% 
              stringr::str_locate_all(action_pattern) %>% 
              as.data.table() %>% 
              lapply(function(x) rn[x]),
            by = .(code, item)][
              , end := end + zeros_after]
  DF[tmp, on = .(code, item, rn >= start, rn <= end), 
     med := as.double(median(sales[action == 0])), by = .EACHI][
       , output := as.double(sales)][action == 1, output := pmin(sales, med)][
         , c("rn", "med") := NULL][]
}

接着

sales_action(mydat, 1L, 2L)

所以我得到的结果。

问题是基于以下

每次我必须手动输入的情况下被替换中位数

sales_action(mydat, 1L, 2L)
sales_action(mydat, 3L, 1L)
sales_action(mydat, 2L, 2L)

等等。

那怎么办更换位数为所有可能的情况下自动执行,这样我就不写了,每次sales_action(mydat,.L,.L)

因此,例如输出

code    i    tem    sales   action  output  pattern
52382MCK    11709   30        0       30    01+00
52382MCK    11709   10        1       10    01+00
52382MCK    11709   20        0       20    01+00
52382MCK    11709   15        0       15    01+00
52382MCK    1170    8         0        8    01+00
52382MCK    1170    10        1        8    01+00
52382MCK    1170    2         0        2    01+00
52382MCK    1170    15        0        15   01+00

Answer 1:

如果我理解正确 ,OP要通过比较来分析销售行动成功sales过程中与之前的销售行为后的平均销售周期的行为数据。

有一些挑战:

  1. 有可能是每一个以上的销售动作codeitem组。
  2. 现有的数据可能覆盖比之前的每个和销售行为后,要求33天

恕我直言,引入的场景是绕道来处理问题2。

下面的方法

  • 标识每个内的销售行为codeitem组,
  • 之前和高达每个销售动作三排拿起三个零点行动行,
  • 计算中位数销售的那些行的,和
  • 更新output的情况下,销售行为中的销售数字超过了周边零点行动行的中位数。

术语类别已经创造由OP的销售动作周期(连续的条纹之间进行区分action == 1L )和之前和之后的零个动作周期。

library(data.table)
# coerce to data.table and create categories
setDT(mydat)[, cat := rleid(action), by = .(code, item)][]

# extract action categories, identify preceeding & succeeding zero action categories
mycat <- mydat[, .(action = first(action)), by = .(code, item, cat = cat)][
  , `:=`(before = cat - 1L, after = cat + 1L)][action == 1L]

mycat

       code  item cat action before after
1: 52382MCK 11709   2      1      1     3
2: 52382MCK 11708   2      1      1     3
3: 52382MCK 11710   2      1      1     3
4: 52382MCK 11710   4      1      3     5
5: 52382MCK 11710   6      1      5     7
6: 52499MCK 11203   2      1      1     3
7: 52499MCK 11205   1      1      0     2

需要注意的是组52382MCK, 11710包括三个独立的销售行为。 beforeafter可能指向不存在的cat ,但在随后的加入,这将被自动修正。

# compute median of surrouding zero action categories
action_cat_median <- 
  rbind(
    # get sales from up to 3 zero action rows before action category
    mydat[mycat, on = .(code, item, cat = before), 
          .(sales = tail(sales, 3), i.cat), by =.EACHI],
    # get sales from up to 3 zero action rows after action category
    mydat[mycat, on = .(code, item, cat = after), 
          .(sales = head(sales, 3), i.cat), by =.EACHI]
  )[
    # remove empty groups
    !is.na(sales)][
      # compute median for each action category
      , .(med = as.double(median(sales))), by = .(code, item, cat = i.cat)]

action_cat_median
  code item cat med 1: 52382MCK 11709 2 20.0 2: 52382MCK 11708 2 2.5 3: 52382MCK 11710 2 10.0 4: 52382MCK 11710 4 10.0 5: 52382MCK 11710 6 10.0 6: 52499MCK 11203 2 2.0 
# prepare result
mydat[, output := as.double(sales)][
  # update join
  action_cat_median, on = .(code, item, cat), output := pmin(sales, med)]

编辑:或者,调用pmin()可以通过非等距更换加入的更新,其中销售额超过中位数只有行:

# prepare result, alternative approach
mydat[, output := as.double(sales)][
  # non-equi update join
  action_cat_median, on = .(code, item, cat, output > med), output := med]


mydat
  code item sales action cat output 1: 52382MCK 11709 30 0 1 30.0 2: 52382MCK 11709 10 1 2 10.0 3: 52382MCK 11709 20 0 3 20.0 4: 52382MCK 11709 15 0 3 15.0 5: 52382MCK 11708 2 0 1 2.0 6: 52382MCK 11708 10 1 2 2.5 7: 52382MCK 11708 3 0 3 3.0 8: 52382MCK 11710 30 0 1 30.0 9: 52382MCK 11710 10 0 1 10.0 10: 52382MCK 11710 20 0 1 20.0 11: 52382MCK 11710 15 1 2 10.0 12: 52382MCK 11710 2 0 3 2.0 13: 52382MCK 11710 10 0 3 10.0 14: 52382MCK 11710 3 0 3 3.0 15: 52382MCK 11710 30 0 3 30.0 16: 52382MCK 11710 10 0 3 10.0 17: 52382MCK 11710 20 0 3 20.0 18: 52382MCK 11710 15 1 4 10.0 19: 52382MCK 11710 2 0 5 2.0 20: 52382MCK 11710 10 0 5 10.0 21: 52382MCK 11710 3 0 5 3.0 22: 52382MCK 11710 30 0 5 30.0 23: 52382MCK 11710 10 0 5 10.0 24: 52382MCK 11710 20 0 5 20.0 25: 52382MCK 11710 15 1 6 10.0 26: 52382MCK 11710 2 0 7 2.0 27: 52382MCK 11710 10 0 7 10.0 28: 52382MCK 11710 3 0 7 3.0 29: 52499MCK 11202 2 0 1 2.0 30: 52499MCK 11203 2 0 1 2.0 31: 52499MCK 11203 2 1 2 2.0 32: 52499MCK 11204 2 0 1 2.0 33: 52499MCK 11204 2 0 1 2.0 34: 52499MCK 11205 2 1 1 2.0 35: 52499MCK 11205 2 1 1 2.0 code item sales action cat output 

下列各行已更新:

mydat[output != sales]
  code item sales action cat output 1: 52382MCK 11708 10 1 2 2.5 2: 52382MCK 11710 15 1 2 10.0 3: 52382MCK 11710 15 1 4 10.0 4: 52382MCK 11710 15 1 6 10.0 


文章来源: Combining scenario to Replace Medians by Groups in R