Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Question about UDFs and tuple ordering

Copy link to this message
Re: Question about UDFs and tuple ordering
Russell Jurney 2012-10-05, 16:36
You can write an EvalFunc UDF that depends on a sort, and there are
several in piggybank that do so. COR (the correlate UDF) is such an
example. You call these UDFs on a relation after ordering them.

For example:

answers = foreach (group data by key)
  sorted = order data by value;
  generate my_udf(sorted.field1, sorted.field2);

If I remember correctly, you can in fact also do this:

sorted = order data by field;
answer = foreach sorted generate my_udf(sorted.field, sorted.other_field);

Although strictly speaking, Pig doesn't garuantee a sort is maintained
outside of {}

I can't help on the JOIN, I don't know about that. But check Pig's
bloom filter: http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/Bloom.html

Russell Jurney twitter.com/rjurney
On Oct 5, 2012, at 11:46 AM, Brian Stempin <[EMAIL PROTECTED]> wrote:

> Hi,
> I'm fairly new to writing UDFs and Pig in general.  I want to be able to write a UDF that can take advantage of MapReduce's sorting of data.  Specifically, I'm trying to conceive how I'd write a UDF to do a specialized join or a pivot. In both cases, sorting would be useful.  EvalFunc seems to give no guarantees about ordering of tuples that are passed in.
> Is there any way to do such things as a UDF?
> TIA for your help,
> Brian Stempin
> Machine Learning Engineer
> ColdLight Solutions, LLC
> ________________________________
> This e-mail is intended solely for the above-mentioned recipient and it may contain confidential or privileged information. If you have received it in error, please notify us immediately and delete the e-mail. You must not copy, distribute, disclose or take any action in reliance on it. In addition, the contents of an attachment to this e-mail may contain software viruses which could damage your own computer system. While ColdLight Solutions, LLC has taken every reasonable precaution to minimize this risk, we cannot accept liability for any damage which you sustain as a result of software viruses. You should perform your own virus checks before opening the attachment.