Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Join on custom LoadFunc not working correctly


+
Pradeep Gollakota 2013-05-30, 19:12
Copy link to this message
-
Re: Join on custom LoadFunc not working correctly
Does anyone have any thoughts on this?

I'm completely out of idea's on this.
On Thu, May 30, 2013 at 3:12 PM, Pradeep Gollakota <[EMAIL PROTECTED]>wrote:

> Hey guys,
>
> I have a custom Storage function that loads from the Accumulo database
> (similar to HBase).
> I have the following script that I'm trying to execute:
>
> A = load 'accumulo://table_a'
>          using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
> '-loadKey')
>          as (id: chararray, a: chararray, b: chararray);
> B = load 'accumulo://table_b'
>          using org.apache.accumulo.pig.AccumuloStorage('cf:cq1 cf:cq2',
> '-loadKey')
>          as (id: chararray, a: chararray, b: chararray);
> C = join A by a, B by b;
> dump C;
>
> When I execute this dataset A is not getting loaded.
> If I do the following:
> C = join B by b, A by a;
> A is loaded, but B is not.
>
> The current work around I have for this is to store A and B into temporary
> storage using PigStorage() and load them again to do my join. However,
> that's extra read/write phases that I'd like to avoid. In my implementation
> of the AccumuloStorage() function, I set pig.noSplitCombination to true.
>
> I'm not sure what the problem with my LoadFunc is and why it's not loading
> both datasets correctly.
>
> Any help would be appreciated.
>
> Thanks
> Pradeep
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB