Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Problem while using merge join


+
John 2013-09-13, 16:37
+
John 2013-09-13, 17:04
+
Pradeep Gollakota 2013-09-13, 17:41
+
John 2013-09-13, 17:58
Copy link to this message
-
Re: Problem while using merge join
John 2013-09-13, 18:34
I think I might have found a way to transform it directly into a bag.
Inside the HBaseStorage() Load Function I have set the HBase scan batch to
1, so I got for every scan.next() one column instead of all columns. See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html

setBatch(int batch)
Set the maximum number of values to return for each call to next()

I think this will work. Any idea if this way have disadvantages?

regards
2013/9/13 John <[EMAIL PROTECTED]>

> hi,
>
> the join key is in the bag, thats the problem. The Load Function returns
> only one element 0$ and that is the map. This map is transformed in the
> next step with the UDF "MapToBagUDF" into a bag. for example the load
> functions returns this ([col1,col2,col3), then this map inside the tuple is
> transformed to:
>
> (col1)
> (col2)
> (col3)
>
> Maybe there is is way to transform the map directly in the load function
> into a bag? The problem I see is that the next() Method in the LoadFunc has
> to be a Tuple and no Bag. :/
>
>
>
> 2013/9/13 Pradeep Gollakota <[EMAIL PROTECTED]>
>
>> Since your join key is not in the Bag, can you do your join first and then
>> execute your UDF?
>>
>>
>> On Fri, Sep 13, 2013 at 10:04 AM, John <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Okay, I think I have found the problem here:
>> > http://pig.apache.org/docs/r0.11.1/perf.html#merge-joins ... there is
>> > wirtten;
>> >
>> > There may be filter statements and foreach statements between the sorted
>> > data source and the join statement. The foreach statement should meet
>> the
>> > following conditions:
>> >
>> >    - There should be no UDFs in the foreach statement.
>> >    - The foreach statement should not change the position of the join
>> keys.
>> >    - There should be no transformation on the join keys which will
>> change
>> >    the sort order.
>> >
>> >
>> > I have to use a UDF to transform the Map into a Bag ... any Workaround
>> > idea?
>> >
>> > thanks
>> >
>> >
>> > 2013/9/13 John <[EMAIL PROTECTED]>
>> >
>> > > Hi,
>> > >
>> > > I try to use a merge join for 2 bags. Here is my pig code:
>> > > http://pastebin.com/Y9b2UtNk .
>> > >
>> > > But I got this error:
>> > >
>> > > Caused by:
>> > >
>> >
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
>> > > ERROR 1103: Merge join/Cogroup only supports Filter, Foreach,
>> Ascending
>> > > Sort, or Load as its predecessors. Found
>> > >
>> > > I think the reason is that there is no sort function or something like
>> > > this. But the bags are definitely sorted. How can I do the merge join?
>> > >
>> > > thanks
>> > >
>> >
>>
>
>
+
Shahab Yunus 2013-09-13, 19:00
+
John 2013-09-13, 19:06
+
Pradeep Gollakota 2013-09-13, 20:16
+
John 2013-09-13, 20:51