Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Join on custom LoadFunc not working correctly


+
Pradeep Gollakota 2013-05-30, 19:12
Copy link to this message
-
Re: Join on custom LoadFunc not working correctly
Does anyone have any thoughts on this?

I'm completely out of idea's on this.
On Thu, May 30, 2013 at 3:12 PM, Pradeep Gollakota <[EMAIL PROTECTED]>wrote:

> Hey guys,
>
> I have a custom Storage function that loads from the Accumulo database
> (similar to HBase).
> I have the following script that I'm trying to execute:
>
> A = load 'accumulo://table_a'
>          using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
> '-loadKey')
>          as (id: chararray, a: chararray, b: chararray);
> B = load 'accumulo://table_b'
>          using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
> '-loadKey')
>          as (id: chararray, a: chararray, b: chararray);
> C = join A by a, B by b;
> dump C;
>
> When I execute this dataset A is not getting loaded.
> If I do the following:
> C = join B by b, A by a;
> A is loaded, but B is not.
>
> The current work around I have for this is to store A and B into temporary
> storage using PigStorage() and load them again to do my join. However,
> that's extra read/write phases that I'd like to avoid. In my implementation
> of the AccumuloStorage() function, I set pig.noSplitCombination to true.
>
> I'm not sure what the problem with my LoadFunc is and why it's not loading
> both datasets correctly.
>
> Any help would be appreciated.
>
> Thanks
> Pradeep
>