Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Problem while using merge join


Copy link to this message
-
Re: Problem while using merge join
Shahab Yunus 2013-09-13, 19:00
Wouldn't this slow down your data retrieval? Once column in each call
instead of a batch?

Regards,
Shahab
On Fri, Sep 13, 2013 at 2:34 PM, John <[EMAIL PROTECTED]> wrote:

> I think I might have found a way to transform it directly into a bag.
> Inside the HBaseStorage() Load Function I have set the HBase scan batch to
> 1, so I got for every scan.next() one column instead of all columns. See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
>
> setBatch(int batch)
> Set the maximum number of values to return for each call to next()
>
> I think this will work. Any idea if this way have disadvantages?
>
> regards
>
>
> 2013/9/13 John <[EMAIL PROTECTED]>
>
> > hi,
> >
> > the join key is in the bag, thats the problem. The Load Function returns
> > only one element 0$ and that is the map. This map is transformed in the
> > next step with the UDF "MapToBagUDF" into a bag. for example the load
> > functions returns this ([col1,col2,col3), then this map inside the tuple
> is
> > transformed to:
> >
> > (col1)
> > (col2)
> > (col3)
> >
> > Maybe there is is way to transform the map directly in the load function
> > into a bag? The problem I see is that the next() Method in the LoadFunc
> has
> > to be a Tuple and no Bag. :/
> >
> >
> >
> > 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
> >
> >> Since your join key is not in the Bag, can you do your join first and
> then
> >> execute your UDF?
> >>
> >>
> >> On Fri, Sep 13, 2013 at 10:04 AM, John <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Okay, I think I have found the problem here:
> >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
> >> > wirtten;
> >> >
> >> > There may be filter statements and foreach statements between the
> sorted
> >> > data source and the join statement. The foreach statement should meet
> >> the
> >> > following conditions:
> >> >
> >> >    - There should be no UDFs in the foreach statement.
> >> >    - The foreach statement should not change the position of the join
> >> keys.
> >> >    - There should be no transformation on the join keys which will
> >> change
> >> >    the sort order.
> >> >
> >> >
> >> > I have to use a UDF to transform the Map into a Bag ... any Workaround
> >> > idea?
> >> >
> >> > thanks
> >> >
> >> >
> >> > 2013/9/13 John <[EMAIL PROTECTED]>
> >> >
> >> > > Hi,
> >> > >
> >> > > I try to use a merge join for 2 bags. Here is my pig code:
> >> > > http://pastebin.com/Y9b2UtNk .
> >> > >
> >> > > But I got this error:
> >> > >
> >> > > Caused by:
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
> >> Ascending
> >> > > Sort, or Load as its predecessors. Found
> >> > >
> >> > > I think the reason is that there is no sort function or something
> like
> >> > > this. But the bags are definitely sorted. How can I do the merge
> join?
> >> > >
> >> > > thanks
> >> > >
> >> >
> >>
> >
> >
>