The class is used to implement functions to be applied to
fields in a dataset. The function is applied to each Tuple in the set.
The programmer should not make assumptions about state maintained
between invocations of the exec() method since the Pig runtime
will schedule and localize invocations based on information provided
at runtime. The programmer also should not make assumptions about when or
how many times the class will be instantiated, since it may be instantiated
multiple times in both the front and back end.
Utility method to allow UDF to report progress. If exec will take more than a
a few seconds PigProgressable.progress() should be called
occasionally to avoid timeouts. Default Hadoop timeout is 600 seconds.
public final void warn(String msg,
Issue a warning. Warning messages are aggregated and reported to
msg - String message of the warning
warningEnum - type of warning
public void finish()
Placeholder for cleanup to be performed at the end. User defined functions can override.
Default implementation is a no-op.
This callback method must be implemented by all subclasses. This
is the method that will be invoked on every Tuple of a given dataset.
Since the dataset may be divided up in a variety of ways the programmer
should not make assumptions about state that is maintained between
invocations of this method.
Allow a UDF to specify type specific implementations of itself. For example,
an implementation of arithmetic sum might have int and float implementations,
since integer arithmetic performs much better than floating point arithmetic. Pig's
typechecker will call this method and using the returned list plus the schema
of the function's input data, decide which implementation of the UDF to use.
A List containing FuncSpec objects representing the EvalFunc class
which can handle the inputs corresponding to the schema in the objects. Each
FuncSpec should be constructed with a schema that describes the input for that
implementation. For example, the sum function above would return two elements in its
FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)))
FuncSpec(IntSum.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.INTEGER)))
This would indicate that the main implementation is used for doubles, and the special
implementation IntSum is used for ints.
public final void setPigLogger(PigLogger pigLogger)
Set the PigLogger object. Called by Pig to provide a reference
to the UDF.
pigLogger - PigLogger object.
public org.apache.commons.logging.Log getLogger()
public void setUDFContextSignature(String signature)
This method will be called by Pig both in the front end and back end to
pass a unique signature to the EvalFunc. The signature can be used
to store into the UDFContext any information which the
EvalFunc needs to store between various method invocations in the
front end and back end.
signature - a unique signature to identify this EvalFunc