We have a ~35 GB Hbase table that's split across several hundred regions.
I'm using the Pig version bundled with CDH3u1, which is 0.8.1 plus a few
patches. In particular, it includes PIG-1680.
With the push down filters from PIG-1680, my thought was that a LOAD/FILTER
combo like  would only result in map tasks being created for the regions
that overlap the requested key space (eg., greater than '12344323413').
Instead I see a map task being created for every region in the table. Was
my assumption off?
Fwiw, I see the same results if I use the -gte param to HbaseStorage.
cvps = LOAD 'hbase://cvps' USING
A = FILTER cvps BY rowkey > '12344323413';
Bill Graham 2011-08-15, 16:37
Norbert Burger 2011-08-15, 17:20
Bill Graham 2011-08-15, 18:13