Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Extract a whole table for a given time(stamp)


+
Gaurav Pandit 2013-05-06, 17:27
+
Jean-Daniel Cryans 2013-05-06, 17:33
+
Gaurav Pandit 2013-05-06, 17:40
Copy link to this message
-
Re: Extract a whole table for a given time(stamp)
Shahab Yunus 2013-05-06, 17:50
Gaurav, when you say that you want older versions of the data then are you
talking about filtering on the internal timestamps (and hence the internal
versioning mechanism) or your data has a separate column (basically using
custom versioning) for versioning? If the later then you can use Pig. It
can dump your data directly into an RDBMS like MySQL too as a DBStorage
loader/store is available.

Might not be totally applicable to your issue but just wanted to share a
thought.

Regards,
Shahab
On Mon, May 6, 2013 at 1:40 PM, Gaurav Pandit <[EMAIL PROTECTED]>wrote:

> Thanks J-D.
>
> Wouldn't the export utility export the data in sequence file format? My
> goal is to generate data in some sort of delimited plain text file and hand
> it over the caller.
>
> - Gaurav
>
>
> On Mon, May 6, 2013 at 1:33 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >wrote:
>
> > You can use the Export MR job provided with HBase, it lets you set a time
> > range: http://hbase.apache.org/book.html#export
> >
> > J-D
> >
> >
> > On Mon, May 6, 2013 at 10:27 AM, Gaurav Pandit <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Hbase users,
> > >
> > > We have a use case where we need to know how data looked at a given
> time
> > in
> > > past.
> > >
> > > The data is stored in HBase of course, with multiple versions. And, the
> > > goal is to be able to extractall records (rowkey, columns) as of a
> given
> > > timestamp, to a file.
> > >
> > >
> > > I am trying to figure out the best way to achieve this.
> > >
> > > The options I know are:
> > > 1. Write a *Java* client using HBase Java API, and scan the hbase
> table.
> > > 2. Do the same, but over *Thrift* HBase API using Perl (since
> > > our environment is mostly Perl).
> > > 3. Use *Hive *to point to HBase table, and use Sqoop to extract data
> from
> > > the Hive table and onto client / RDBMS.
> > > 4. Use *Pig *to extract data from HBase table and dump it on HDFS and
> > move
> > > the file over to the client.
> > >
> > > So far, I have successfully implemented option (2). I am still running
> > some
> > > tests to see how it performs, but it works fine as such.
> > >
> > > My questions are:
> > > 1. Is option (3) or (4) even possible? I am not sure if we can access
> the
> > > table for a given timestamp over Pig or Hive.
> > > 2. Is there any other better way of achieving this?
> > >
> > >
> > > Thanks!
> > > Gaurav
> > >
> >
>
+
Gaurav Pandit 2013-05-06, 18:19
+
Jean-Daniel Cryans 2013-05-06, 18:30
+
Gaurav Pandit 2013-05-06, 18:34
+
Jean-Daniel Cryans 2013-05-06, 17:49