Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Problem while using merge join


Copy link to this message
-
Re: Problem while using merge join
I think I might have found a way to transform it directly into a bag.
Inside the HBaseStorage() Load Function I have set the HBase scan batch to
1, so I got for every scan.next() one column instead of all columns. See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html

setBatch(int batch)
Set the maximum number of values to return for each call to next()

I think this will work. Any idea if this way have disadvantages?

regards
2013/9/13 John <[EMAIL PROTECTED]>

> hi,
>
> the join key is in the bag, thats the problem. The Load Function returns
> only one element 0$ and that is the map. This map is transformed in the
> next step with the UDF "MapToBagUDF" into a bag. for example the load
> functions returns this ([col1,col2,col3), then this map inside the tuple is
> transformed to:
>
> (col1)
> (col2)
> (col3)
>
> Maybe there is is way to transform the map directly in the load function
> into a bag? The problem I see is that the next() Method in the LoadFunc has
> to be a Tuple and no Bag. :/
>
>
>
> 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
>
>> Since your join key is not in the Bag, can you do your join first and then
>> execute your UDF?
>>
>>
>> On Fri, Sep 13, 2013 at 10:04 AM, John <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Okay, I think I have found the problem here:
>> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
>> > wirtten;
>> >
>> > There may be filter statements and foreach statements between the sorted
>> > data source and the join statement. The foreach statement should meet
>> the
>> > following conditions:
>> >
>> >    - There should be no UDFs in the foreach statement.
>> >    - The foreach statement should not change the position of the join
>> keys.
>> >    - There should be no transformation on the join keys which will
>> change
>> >    the sort order.
>> >
>> >
>> > I have to use a UDF to transform the Map into a Bag ... any Workaround
>> > idea?
>> >
>> > thanks
>> >
>> >
>> > 2013/9/13 John <[EMAIL PROTECTED]>
>> >
>> > > Hi,
>> > >
>> > > I try to use a merge join for 2 bags. Here is my pig code:
>> > > http://pastebin.com/Y9b2UtNk .
>> > >
>> > > But I got this error:
>> > >
>> > > Caused by:
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
>> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
>> Ascending
>> > > Sort, or Load as its predecessors. Found
>> > >
>> > > I think the reason is that there is no sort function or something like
>> > > this. But the bags are definitely sorted. How can I do the merge join?
>> > >
>> > > thanks
>> > >
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB