Russell Jurney 2012-10-05, 16:36
You can write an EvalFunc UDF that depends on a sort, and there are
several in piggybank that do so. COR (the correlate UDF) is such an
example. You call these UDFs on a relation after ordering them.

For example:

answers = foreach (group data by key)
  sorted = order data by value;
  generate my_udf(sorted.field1, sorted.field2);

If I remember correctly, you can in fact also do this:

sorted = order data by field;
answer = foreach sorted generate my_udf(sorted.field, sorted.other_field);

Although strictly speaking, Pig doesn't garuantee a sort is maintained
outside of {}

I can't help on the JOIN, I don't know about that. But check Pig's
bloom filter: http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/Bloom.html

Russell Jurney twitter.com/rjurney
On Oct 5, 2012, at 11:46 AM, Brian Stempin <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm fairly new to writing UDFs and Pig in general.  I want to be able to write a UDF that can take advantage of MapReduce's sorting of data.  Specifically, I'm trying to conceive how I'd write a UDF to do a specialized join or a pivot. In both cases, sorting would be useful.  EvalFunc seems to give no guarantees about ordering of tuples that are passed in.
> Is there any way to do such things as a UDF?
> TIA for your help,
> Brian Stempin
> Machine Learning Engineer
> ColdLight Solutions, LLC
