Using the magic.wand function for a block of code

2019-09-05 18:02发布

问题:

I want to use the plyrmr package while keeping my existent code written in dplyr and thus I want to use the "magic.wand" function. I am using the "mtcars" dataset for simplicity and the the path to it is "/user/sgerony/mtcars2" on the HDFS (Hadoop Distributed File System).

The block of code contains base functions but also dplyr functions and this is my code:

magic.wand(rename,TRUE)
filename <- "/user/sgerony/mtcars"
complex.function = function(x){
  x$carb <- x[,ncol(x)]*2 
  x$carb <- x$carb+2
  x <- as.data.frame(rename(x, lol=carb))
  return(x)
}
magic.wand(complex.function)
# does NOT work
input(filename) %|% complex.function()

Result (Note: Row names have dissapeared)

    mpg cyl  disp  hp drat    wt  qsec vs am gear lol
1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4  10
2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4  10
3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4   4
4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3   4
5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3   6
6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3   4
7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3  10
8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4   6
9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4   6
10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4  10
11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4  10
12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3   8
13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3   8
14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3   8
15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3  10
16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3  10
17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3  10
18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4   4
19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4   6
20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4   4
21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3   4
22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3   6
23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3   6
24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3  10
25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3   6
26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4   4
27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5   6
28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5   6
29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5  10
30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5  14
31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5  18
32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4   6

Question 1: Is this the right way to do? meaning do I have to call a first time the magic.wand for the dplyr functions and then for the bloc of code?

Question 2: Why can't I call the magic.wand function like this?

magic.wand(dplyr::rename,TRUE)

Result:

> magic.wand(dplyr::rename,TRUE)
Error in match.fun(paste0(f.name, "_")) : 
  'paste0(c("::", "dplyr", "rename"), "_")' is not a function, character or symbol

Details: It creates functions of names "::","::.data.frame","::.default","plyr","rename"

Isn't it necessary to be sure that we are not using functions contained in several libraries?

Question 3: Why do I have to put "TRUE" as a second argument of the first magic.wand call and not the last one?

filename<-"/user/sgerony/mtcars"
magic.wand(rename,TRUE)
filename <- "/user/sgerony/mtcars"
complex.function = function(x){
  x$carb <- x[,ncol(x)]*2 
  x$carb <- x$carb+2
  x <- as.data.frame(rename(x, lol=carb))
  return(x)
}
magic.wand(complex.function,TRUE)

Error:

Error in get(as.character(FUN), mode = "function", envir = envir) : 
object 'complex.function_' of mode 'function' was not found

Question 4: What if my block of code is using the dplyr piping operator? namely:

complex.function = function(x){
      x$carb <- x[,ncol(x)]*2 
      x$carb <- x$carb+2
      x <- as.data.frame(x %>% rename(lol=carb))
      return(x)
    }

Should I just replace the "%>%" by the plyrmr piping operator? Namely "%|%"?

Question 5: Should I call the magic.wand function on dplyr functions that are equivalent to plyrmr functions like "group_by"?

Question 6: Why do I have an error when using as.POSIXct?

magic.wand(mutate,TRUE)
filename <- "/user/sgerony/mtcars"
complex.function = function(x){
  x$carb <- x[,ncol(x)]*2 
  x$carb <- x$carb+2
  x <- as.data.frame(mutate(x,date.time=as.POSIXct("2014-01-01 03:15")))
  return(x)
}
magic.wand(complex.function)

#Works
mtcars %|% complex.function()
# does NOT work
input(filename) %|% complex.function()

I realize this is a big question, so thanks for trying to help

回答1:

  1. a. No, as the error message shows. b. No.
  2. a.Why not? b. No. There's plenty of explanations about namespaces in R elsewhere.
  3. help(magic.wand) explains that. If you told me it's unclear for some reason, I'd try to do better, but cutting and pasting Rd docs is against the rules.
  4. It should work, but I am not going to support it.
  5. There are native functions in plyrmr that do what dplyr functions do (and use them already). The right use for magic.wand is to make custom functions like complex.functions hadoop-aware.
  6. That's unrelated to plyrmr, your best bet is to post a separate question.

Let me give it a try:

complex.function = function(x){
  x$carb <- x[,ncol(x)]*2 
  x$carb <- x$carb+2
  rename(x, lol=carb)}
magic.wand(complex.function)
input(mtcars) %|% complex.function

Please note: only one magic.wand call needed (the other one shouldn't hurt, but redundant) and some crud removed from complex.function. Works for me.