Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Extract a whole table for a given time(stamp)


Copy link to this message
-
Re: Extract a whole table for a given time(stamp)
Jean-Daniel Cryans 2013-05-06, 18:30
You could save some time by using
http://hbase.apache.org/book.html#copytable

J-D
On Mon, May 6, 2013 at 11:19 AM, Gaurav Pandit <[EMAIL PROTECTED]>wrote:

> Thanks for your inputs, J-D, Shahab.
>
> Sorry if I was ambiguous in stating what I wanted to do. Just to restate
> the goal in one line:
> "Extract all rows (with rowkey, columns) from an Hbase table as of a given
> time using HBase timestamp/versions, in a plain text file format"
>
> J-D, we have about 5 millions rows (but each could have multiple versions)
> for now. So I think scanning the whole table is okay for now. But it seems
> it may not be the best option for a big table. Also, as I mentioned
> earlier, I think Hive/Pig does not let you access Hbase for a timestamp. If
> they can do that, it's the approach I wanted to take.
>
> But your suggestion of using *export* got me thinking, and the following
> may work out well:
> 1. Export HBase table for a given timestamp using "*export*" utility .
> 2. Import the file into another "temp" HBase table.
> 3. Use Pig/Hive to extract the table and put it on an HDFS file in plain
> text (or onto an RDBMS).
> 4. Let the client retrieve the file.
>
> Shahab, in my case, I was talking about using internal timestamp. But
> thanks for your input - I was unaware of Pig DBStorage loader! It may come
> handy in some other scenario.
>
>
> Thanks,
> Gaurav
>
>
>
> On Mon, May 6, 2013 at 1:50 PM, Shahab Yunus <[EMAIL PROTECTED]>
> wrote:
>
> > Gaurav, when you say that you want older versions of the data then are
> you
> > talking about filtering on the internal timestamps (and hence the
> internal
> > versioning mechanism) or your data has a separate column (basically using
> > custom versioning) for versioning? If the later then you can use Pig. It
> > can dump your data directly into an RDBMS like MySQL too as a DBStorage
> > loader/store is available.
> >
> > Might not be totally applicable to your issue but just wanted to share a
> > thought.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, May 6, 2013 at 1:40 PM, Gaurav Pandit <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks J-D.
> > >
> > > Wouldn't the export utility export the data in sequence file format? My
> > > goal is to generate data in some sort of delimited plain text file and
> > hand
> > > it over the caller.
> > >
> > > - Gaurav
> > >
> > >
> > > On Mon, May 6, 2013 at 1:33 PM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > You can use the Export MR job provided with HBase, it lets you set a
> > time
> > > > range: http://hbase.apache.org/book.html#export
> > > >
> > > > J-D
> > > >
> > > >
> > > > On Mon, May 6, 2013 at 10:27 AM, Gaurav Pandit <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi Hbase users,
> > > > >
> > > > > We have a use case where we need to know how data looked at a given
> > > time
> > > > in
> > > > > past.
> > > > >
> > > > > The data is stored in HBase of course, with multiple versions. And,
> > the
> > > > > goal is to be able to extractall records (rowkey, columns) as of a
> > > given
> > > > > timestamp, to a file.
> > > > >
> > > > >
> > > > > I am trying to figure out the best way to achieve this.
> > > > >
> > > > > The options I know are:
> > > > > 1. Write a *Java* client using HBase Java API, and scan the hbase
> > > table.
> > > > > 2. Do the same, but over *Thrift* HBase API using Perl (since
> > > > > our environment is mostly Perl).
> > > > > 3. Use *Hive *to point to HBase table, and use Sqoop to extract
> data
> > > from
> > > > > the Hive table and onto client / RDBMS.
> > > > > 4. Use *Pig *to extract data from HBase table and dump it on HDFS
> and
> > > > move
> > > > > the file over to the client.
> > > > >
> > > > > So far, I have successfully implemented option (2). I am still
> > running
> > > > some
> > > > > tests to see how it performs, but it works fine as such.
> > > > >
> > > > > My questions are:
> > > > > 1. Is option (3) or (4) even possible? I am not sure if we can