I am trying to write a Java UDF with the end goal of extending/overriding the load method of PigStorage to support entries that take multiple lines.
My pig script is as follows:
REGISTER udf.jar;
register 'userdef.py' using jython as parser;
A = LOAD 'test_data' USING PigStorage() AS row:chararray;
C = FOREACH A GENERATE myTOKENIZE.test();
DUMP D;
udf.jar looks like:
udf/myTOKENIZE.class
myTOKENIZE.java imports org.apache.pig.* ande extends EvalFunc. the test method just returns a Hello world String.
The problem that I am having is that when I try to call the method test() of class myTOKENIZE I get Error 1070: ERROR 1070: Could not resolve myTOKENIZE.test using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Thoughts?
As your UDF extends EvalFunc
there should me a method called exec()
in the class myTOKENIZE
.
Your pig code would then look as follows:
C = FOREACH A GENERATE udf.myTOKENIZE(*);
Please read http://pig.apache.org/docs/r0.7.0/udf.html#How+to+Write+a+Simple+Eval+Function
Hope that helps.
So is myTOKENIZE in the package udf? In that case you'd need
C = FOREACH A GENERATE udf.myTOKENIZE.test();
After waaaaay too much time (and coffee) and a bunch a trial and error, I figured out my issue.
Important note: For some jar myudfs.jar, the classes contained within must have package defined as myudfs.
The corrected code is as follows:
REGISTER myudfs.jar;
register 'userdef.py' using jython as parser;
A = LOAD 'test_data' USING PigStorage() AS row:chararray;
C = FOREACH A GENERATE myudfs.myTOKENIZE('');
DUMP C;
myTOKENIZE.java:
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
public class myTOKENIZE extends EvalFunc (String)
{
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try{
String str = (String)input.get(0);
return str.toUpperCase();
}catch(Exception e){
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}
}
the structure of myudfs.jar:
myudfs/myTOKENIZE.class
Hopefully this proves useful to someone else with similar issues!
This is very late but I think the solution is that while using the udf in your pig you have to give fully qualified path of the class with your package name.
package com.evalfunc.udf
; and Power is my class name as
public class Power extends EvalFunc<Integer> {....}
Then while using it in pig first register the jar file in pig and then use the udf with full package name like:
record = LOAD '/user/fsbappdev/maitytest/pig/pigudf/power_data' USING PigStorage(',');
pow_result = foreach record generate com.evalfunc.udf.Power(base,exponent);