How to access custom UDFs through Spark Thrift Ser

2019-07-24 02:25发布

问题:

I am running Spark Thrift Server on EMR. I start up the Spark Thrift Server by:

  sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh --queue interactive.thrift --jars /opt/lib/custom-udfs.jar

Notice that I have a customer UDF jar and I want to add it to the Thrift Server classpath, so I added --jars /opt/lib/custom-udfs.jar in the above command.

Once I am in my EMR, I issued the following to connect to the Spark Thrift Server.

beeline -u jdbc:hive2://localhost:10000/default

Then I was able to issue command like show databases. But how do I access the custom UDF? I thought by adding the --jars option in the Thrift Server startup script, that would add the jar for Hive resource to use as well.

The only way I can access the custom UDF now is by adding the customer UDF jar to Hive resource

add jar /opt/lib/custom-udfs.jar

Then create function of the UDF.

Question: Is there a way to auto config the custom UDF jar without adding jar each time to the spark session?

Thanks!

回答1:

The easiest way is to edit the file start-thriftserver.sh, at the end:

  1. Wait server is ready
  2. Execute setup SQL query

You could also post a proposal on JIRA, this is a very good feature "Execute setup code at start up".