-Re: Replicated Join and OOM errors
Aniket Mokashi 2013-07-21, 19:28
Pig does not currently have a way to do this. The development of feature
like this is tracked at - https://issues.apache.org/jira/browse/PIG-2784.
Feel free to add a subtask and take a stab at it.
On Fri, Jul 19, 2013 at 12:58 PM, Mehmet Tepedelenlioglu <
[EMAIL PROTECTED]> wrote:
> You can always split your tables such that same keys end up in same
> splits. Then you replicated join the corresponding splits and take the
> On Jul 19, 2013, at 12:26 PM, Arun Ahuja <[EMAIL PROTECTED]> wrote:
> > I have been using a replicated join to join on very large set of data
> > another one that is about 1000x smaller. Generally seen large
> > gains.
> > However, they do scale together, so that now even though the RHS table
> > still 1000x smaller, it is too large to fit into memory. There will
> > on only every 20th or so dataset that join is performed on, but I'd like
> > have something robust built to handle this.
> > Is there anyway to setup the replicated join to back to a regular join
> > on memory issues? Or any type of conditional I could set to check the
> > dataset size first? Willing to even dig into the Pig could and implement
> > this if anyone has ideas.
> > Thanks
> > Arun