Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> push down filters for HbaseStorage


Copy link to this message
-
Re: push down filters for HbaseStorage
Bill -- thanks for your quick response.  I just tried to put together a
debug log for the -gte case to provide more info, and realized that it WAS
working as advertised (map tasks created only for overlapping regions).
 Sorry for the false alarm.

Out of curiosity, is there a JIRA option to track the FILTER version of
this?  PIG-1205 seems to be an umbrella ticket for all the changes.

Norbert

On Mon, Aug 15, 2011 at 12:37 PM, Bill Graham <[EMAIL PROTECTED]> wrote:

> I don't think the predicate push-down you're showing in [1] is currently
> supported, but the -gte param in the constructor definitely is (see
> HBaseTableInputFormat and PIG-1205). If  that's not working, then it's a
> bug. Is there anything helpful in the logs?
>
>
>
> On Mon, Aug 15, 2011 at 9:19 AM, Norbert Burger <[EMAIL PROTECTED]
> >wrote:
>
> > Hi folks,
> >
> > We have a ~35 GB Hbase table that's split across several hundred regions.
> > I'm using the Pig version bundled with CDH3u1, which is 0.8.1 plus a few
> > patches.  In particular, it includes PIG-1680.
> >
> > With the push down filters from PIG-1680, my thought was that a
> LOAD/FILTER
> > combo like [1] would only result in map tasks being created for the
> regions
> > that overlap the requested key space (eg., greater than '12344323413').
> >  Instead I see a map task being created for every region in the table.
>  Was
> > my assumption off?
> >
> > Fwiw, I see the same results if I use the -gte param to HbaseStorage.
> >
> > Norbert
> >
> > [1]
> > cvps = LOAD 'hbase://cvps' USING
> > org.apache.pig.backend.hadoop.hbase.HBaseStorage('data:value','-loadKey')
> > as
> > (rowkey:chararray, datavalue:chararray);
> > A = FILTER cvps BY rowkey > '12344323413';
> >
>