Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> How to load a subset of an HBase table (timestamp based) ?

Copy link to this message
Re: How to load a subset of an HBase table (timestamp based) ?
You can instruct HBaseStorage to load a subset of the rows using the "-gt"
and "-lt" options to HBaseStorage, documented here [1].

I don't believe querying by timestamp is currently supported in Pig, based
on the comments to [2].  There is a standalone JIRA that's been created [3].


[2] https://issues.apache.org/jira/browse/PIG-1782
[3] https://issues.apache.org/jira/browse/PIG-1832

On Thu, Jul 28, 2011 at 6:18 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi,
> I'd like to make PIG load only a subset of an HBase table, based on the
> timestamp of the records, or on the key of the rows.
> As an example, I'd like to load only records that have a timestamp > N, or
> a key > "something".
> I know that HBase can handle scanners that are highly optimized to perform
> this kind of things, and it would greatly improve the time needed to load my
> data.
> Is there any way to do this ?
> If not, it is planned to be added in the HBase loader ?
> If not, is it technically possible to do it ?
> If yes, can I contribute and propose a patch on that ?
> Thank a lot !