What is the best way to define custom methods on a

2019-01-22 15:54发布

问题:

I need to define custom methods on DataFrame. What is the better way to do it? The solution should be scalable, as I intend to define a significant number of custom methods.

My current approach is to create a class (say MyClass) with DataFrame as parameter, define my custom method (say customMethod) in that and define an implicit method which converts DataFrame to MyClass.

implicit def dataFrametoMyClass(df: DataFrame): MyClass = new MyClass(df)

Thus I can call:

dataFrame.customMethod()

Is this the correct way to do it? Open for suggestions.

回答1:

Your way is the way to go (see [1]). Even though I solved it a little different, the approach stays similar:

Possibility 1

Implicits

object ExtraDataFrameOperations {
  object implicits {
    implicit def dFWithExtraOperations(df: DataFrame) = DFWithExtraOperations(df)
  }
}

case class DFWithExtraOperations(df: DataFrame) {
  def customMethod(param: String) : DataFrame = {
    // do something fancy with the df
    // or delegate to some implementation
    //
    // here, just as an illustrating example: do a select
    df.select( df(param) )
  }
}

Usage

To use the new customMethod method on a DataFrame:

import ExtraDataFrameOperations.implicits._
val df = ...
val otherDF = df.customMethod("hello")

Possibility 2

Instead of using an implicit method (see above), you can also use an implicit class:

Implicit class

object ExtraDataFrameOperations {
  implicit class DFWithExtraOperations(df : DataFrame) {
     def customMethod(param: String) : DataFrame = {
      // do something fancy with the df
      // or delegate to some implementation
      //
      // here, just as an illustrating example: do a select
      df.select( df(param) )
    }
  }
}

Usage

import ExtraDataFrameOperations._
val df = ...
val otherDF = df.customMethod("hello")

Remark

In case you want to prevent the additional import, turn the object ExtraDataFrameOperations into an package object and store it in in a file called package.scala within your package.

Official documentation / references

[1] The original blog "Pimp my library" by M. Odersky is available at http://www.artima.com/weblogs/viewpost.jsp?thread=179766



回答2:

There is a slightly simpler approach: just declare MyClass as implicit

implicit class MyClass(df: DataFrame) { def myMethod = ... }

This automatically creates the implicit conversion method (also called MyClass). You can also make it a value class by adding extends AnyVal which avoids some overhead by not actually creating a MyClass instance at runtime, but this is very unlikely to matter in practice.

Finally, putting MyClass into a package object will allow you to use the new methods anywhere in this package without requiring import of MyClass, which may be a benefit or a drawback for you.



回答3:

I think you should add an implicit conversion between DataFrame and your custom wrapper, but use an implicit clas - this should be the easiest to use and you will store your custom methods in one common place.

   implicit class WrappedDataFrame(val df: DataFrame) {
        def customMethod(String arg1, int arg2) {
           ...[do your stuff here]
        }
     ...[other methods you consider useful, getters, setters, whatever]...
      }

If the implicit wrapper is in DataFrame's scope, you can just use normal DataFrame as if it was your wrapper, ie.:

df.customMethod("test", 100)