Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> push down filters for HbaseStorage


Copy link to this message
-
push down filters for HbaseStorage
Hi folks,

We have a ~35 GB Hbase table that's split across several hundred regions.
I'm using the Pig version bundled with CDH3u1, which is 0.8.1 plus a few
patches.  In particular, it includes PIG-1680.

With the push down filters from PIG-1680, my thought was that a LOAD/FILTER
combo like [1] would only result in map tasks being created for the regions
that overlap the requested key space (eg., greater than '12344323413').
 Instead I see a map task being created for every region in the table.  Was
my assumption off?

Fwiw, I see the same results if I use the -gte param to HbaseStorage.

Norbert

[1]
cvps = LOAD 'hbase://cvps' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:value','-loadKey') as
(rowkey:chararray, datavalue:chararray);
A = FILTER cvps BY rowkey > '12344323413';
+
Bill Graham 2011-08-15, 16:37
+
Norbert Burger 2011-08-15, 17:20
+
Bill Graham 2011-08-15, 18:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB