Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Problem while using merge join


Copy link to this message
-
Re: Problem while using merge join
Wouldn't this slow down your data retrieval? Once column in each call
instead of a batch?

Regards,
Shahab
On Fri, Sep 13, 2013 at 2:34 PM, John <[EMAIL PROTECTED]> wrote:

> I think I might have found a way to transform it directly into a bag.
> Inside the HBaseStorage() Load Function I have set the HBase scan batch to
> 1, so I got for every scan.next() one column instead of all columns. See
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html
>
> setBatch(int batch)
> Set the maximum number of values to return for each call to next()
>
> I think this will work. Any idea if this way have disadvantages?
>
> regards
>
>
> 2013/9/13 John <[EMAIL PROTECTED]>
>
> > hi,
> >
> > the join key is in the bag, thats the problem. The Load Function returns
> > only one element 0$ and that is the map. This map is transformed in the
> > next step with the UDF "MapToBagUDF" into a bag. for example the load
> > functions returns this ([col1,col2,col3), then this map inside the tuple
> is
> > transformed to:
> >
> > (col1)
> > (col2)
> > (col3)
> >
> > Maybe there is is way to transform the map directly in the load function
> > into a bag? The problem I see is that the next() Method in the LoadFunc
> has
> > to be a Tuple and no Bag. :/
> >
> >
> >
> > 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
> >
> >> Since your join key is not in the Bag, can you do your join first and
> then
> >> execute your UDF?
> >>
> >>
> >> On Fri, Sep 13, 2013 at 10:04 AM, John <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Okay, I think I have found the problem here:
> >> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
> >> > wirtten;
> >> >
> >> > There may be filter statements and foreach statements between the
> sorted
> >> > data source and the join statement. The foreach statement should meet
> >> the
> >> > following conditions:
> >> >
> >> >    - There should be no UDFs in the foreach statement.
> >> >    - The foreach statement should not change the position of the join
> >> keys.
> >> >    - There should be no transformation on the join keys which will
> >> change
> >> >    the sort order.
> >> >
> >> >
> >> > I have to use a UDF to transform the Map into a Bag ... any Workaround
> >> > idea?
> >> >
> >> > thanks
> >> >
> >> >
> >> > 2013/9/13 John <[EMAIL PROTECTED]>
> >> >
> >> > > Hi,
> >> > >
> >> > > I try to use a merge join for 2 bags. Here is my pig code:
> >> > > http://pastebin.com/Y9b2UtNk .
> >> > >
> >> > > But I got this error:
> >> > >
> >> > > Caused by:
> >> > >
> >> >
> >>
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
> >> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
> >> Ascending
> >> > > Sort, or Load as its predecessors. Found
> >> > >
> >> > > I think the reason is that there is no sort function or something
> like
> >> > > this. But the bags are definitely sorted. How can I do the merge
> join?
> >> > >
> >> > > thanks
> >> > >
> >> >
> >>
> >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB