Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Problem while using merge join


Copy link to this message
-
Re: Problem while using merge join
hi,

the join key is in the bag, thats the problem. The Load Function returns
only one element 0$ and that is the map. This map is transformed in the
next step with the UDF "MapToBagUDF" into a bag. for example the load
functions returns this ([col1,col2,col3), then this map inside the tuple is
transformed to:

(col1)
(col2)
(col3)

Maybe there is is way to transform the map directly in the load function
into a bag? The problem I see is that the next() Method in the LoadFunc has
to be a Tuple and no Bag. :/
2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>

> Since your join key is not in the Bag, can you do your join first and then
> execute your UDF?
>
>
> On Fri, Sep 13, 2013 at 10:04 AM, John <[EMAIL PROTECTED]> wrote:
>
> > Okay, I think I have found the problem here:
> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
> > wirtten;
> >
> > There may be filter statements and foreach statements between the sorted
> > data source and the join statement. The foreach statement should meet the
> > following conditions:
> >
> >    - There should be no UDFs in the foreach statement.
> >    - The foreach statement should not change the position of the join
> keys.
> >    - There should be no transformation on the join keys which will change
> >    the sort order.
> >
> >
> > I have to use a UDF to transform the Map into a Bag ... any Workaround
> > idea?
> >
> > thanks
> >
> >
> > 2013/9/13 John <[EMAIL PROTECTED]>
> >
> > > Hi,
> > >
> > > I try to use a merge join for 2 bags. Here is my pig code:
> > > http://pastebin.com/Y9b2UtNk .
> > >
> > > But I got this error:
> > >
> > > Caused by:
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach, Ascending
> > > Sort, or Load as its predecessors. Found
> > >
> > > I think the reason is that there is no sort function or something like
> > > this. But the bags are definitely sorted. How can I do the merge join?
> > >
> > > thanks
> > >
> >
>