Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Custom joins


Copy link to this message
-
Re: Custom joins
Join on a dummy key or CROSS, then plug the token in a udf.

Russell Jurney
twitter.com/rjurney
[EMAIL PROTECTED]
datasyndrome.com

On Aug 29, 2012, at 4:56 PM, Mat Kelcey <[EMAIL PROTECTED]> wrote:

> Hello!
>
> Considering the following two relations...
>
> grunt> querys = load 'query' as (id:int, token:chararray);
> grunt> dump querys
> (11,foo)
> (12,bar)
> (13,frog)
>
> and
>
> grunt> documents = load 'document' as (id:int, text:chararray);
> grunt> dump documents;
> (21,foo bar frog)
> (22,hello frog)
>
> Is is possible to do a join where the query:token is not equal to but
> contained in documents:text ?
>
> eg
> (11,foo,21,foo bar frog)
> (12,bar,21,foo bar frog)
> (13,frog,21,foo bar frog)
> (13,frog,22,hello frog)
>
> I can certainly do this in Java map/reduce (as we all had to in the
> dark days days before pig) but is there a way to hack this together
> with a custom udf or some other weird join backdoor (customer
> partitioner for a group or something whacky) ???
>
> It's been a long day, maybe I'm just missing some super obvious..
>
> Cheers!
> Mat
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB