Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> How to load a subset of an HBase table (timestamp based) ?


+
Vincent Barat 2011-07-28, 10:18
Copy link to this message
-
Re: How to load a subset of an HBase table (timestamp based) ?
You can instruct HBaseStorage to load a subset of the rows using the "-gt"
and "-lt" options to HBaseStorage, documented here [1].

I don't believe querying by timestamp is currently supported in Pig, based
on the comments to [2].  There is a standalone JIRA that's been created [3].

Norbert

[1]
http://ofps.oreilly.com/titles/9781449302641/community.html#hbase_options_table
[2] https://issues.apache.org/jira/browse/PIG-1782
[3] https://issues.apache.org/jira/browse/PIG-1832

On Thu, Jul 28, 2011 at 6:18 AM, Vincent Barat <[EMAIL PROTECTED]>wrote:

> Hi,
>
> I'd like to make PIG load only a subset of an HBase table, based on the
> timestamp of the records, or on the key of the rows.
>
> As an example, I'd like to load only records that have a timestamp > N, or
> a key > "something".
>
> I know that HBase can handle scanners that are highly optimized to perform
> this kind of things, and it would greatly improve the time needed to load my
> data.
>
> Is there any way to do this ?
> If not, it is planned to be added in the HBase loader ?
> If not, is it technically possible to do it ?
> If yes, can I contribute and propose a patch on that ?
>
> Thank a lot !
>
+
Vincent Barat 2011-07-28, 12:53
+
Norbert Burger 2011-07-28, 13:00
+
Bill Graham 2011-07-28, 17:26