Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Replicated Join and OOM errors


Copy link to this message
-
Re: Replicated Join and OOM errors
Pig does not currently have a way to do this. The development of feature
like this is tracked at - https://issues.apache.org/jira/browse/PIG-2784.
Feel free to add a subtask and take a stab at it.

~Aniket
On Fri, Jul 19, 2013 at 12:58 PM, Mehmet Tepedelenlioglu <
[EMAIL PROTECTED]> wrote:

> You can always split your tables such that same keys end up in same
> splits. Then you replicated join the corresponding splits and take the
> union.
>
> On Jul 19, 2013, at 12:26 PM, Arun Ahuja <[EMAIL PROTECTED]> wrote:
>
> > I have been using a replicated join to join on very large set of data
> with
> > another one that is about 1000x smaller.  Generally seen large
> performance
> > gains.
> >
> > However, they do scale together, so that now  even though the RHS table
> is
> > still 1000x smaller, it is too large to fit into memory.  There will
> happen
> > on only every 20th or so dataset that join is performed on, but I'd like
> to
> > have something robust built to handle this.
> >
> > Is there anyway to setup the replicated join to back to a regular join
> only
> > on memory issues?  Or any type of conditional I could set to check the
> > dataset size first?  Willing to even dig into the Pig could and implement
> > this if anyone has ideas.
> >
> > Thanks
> >
> > Arun
>
>
--
"...:::Aniket:::... Quetzalco@tl"