Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> push down filters for HbaseStorage

Copy link to this message
push down filters for HbaseStorage
Hi folks,

We have a ~35 GB Hbase table that's split across several hundred regions.
I'm using the Pig version bundled with CDH3u1, which is 0.8.1 plus a few
patches.  In particular, it includes PIG-1680.

With the push down filters from PIG-1680, my thought was that a LOAD/FILTER
combo like [1] would only result in map tasks being created for the regions
that overlap the requested key space (eg., greater than '12344323413').
 Instead I see a map task being created for every region in the table.  Was
my assumption off?

Fwiw, I see the same results if I use the -gte param to HbaseStorage.


cvps = LOAD 'hbase://cvps' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:value','-loadKey') as
(rowkey:chararray, datavalue:chararray);
A = FILTER cvps BY rowkey > '12344323413';