Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Custom joins


Copy link to this message
-
Re: Custom joins
Russell Jurney 2012-08-30, 00:04
Join on a dummy key or CROSS, then plug the token in a udf.

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Aug 29, 2012, at 4:56 PM, Mat Kelcey <[EMAIL PROTECTED]> wrote:

> Hello!
>
> Considering the following two relations...
>
> grunt> querys = load 'query' as (id:int, token:chararray);
> grunt> dump querys
> (11,foo)
> (12,bar)
> (13,frog)
>
> and
>
> grunt> documents = load 'document' as (id:int, text:chararray);
> grunt> dump documents;
> (21,foo bar frog)
> (22,hello frog)
>
> Is is possible to do a join where the query:token is not equal to but
> contained in documents:text ?
>
> eg
> (11,foo,21,foo bar frog)
> (12,bar,21,foo bar frog)
> (13,frog,21,foo bar frog)
> (13,frog,22,hello frog)
>
> I can certainly do this in Java map/reduce (as we all had to in the
> dark days days before pig) but is there a way to hack this together
> with a custom udf or some other weird join backdoor (customer
> partitioner for a group or something whacky) ???
>
> It's been a long day, maybe I'm just missing some super obvious..
>
> Cheers!
> Mat