Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Extract a whole table for a given time(stamp)


Copy link to this message
-
Re: Extract a whole table for a given time(stamp)
Obviously I don't know much about your use case, so hopefully this won't
turn into a game of "yes but I also need X" ;)

It sounds like you don't have a lot of data to retrieve? Since your first
and second options are to scan the whole table, it might be that the table
itself is small. If it's small then any option is good and it's just a
matter of writing some code.

Then, option 3 and 4 will write multiple files unless you use only 1
reducer so, since you already need to merge files, you could consider
having a post step that converts the multiple SF into 1 tsv file. Or you
could have you own version of Export that has a single reducer that writes
in the tsv format. The possibilities are endless.

Hope this helps,

J-D
On Mon, May 6, 2013 at 10:40 AM, Gaurav Pandit <[EMAIL PROTECTED]>wrote:

> Thanks J-D.
>
> Wouldn't the export utility export the data in sequence file format? My
> goal is to generate data in some sort of delimited plain text file and hand
> it over the caller.
>
> - Gaurav
>
>
> On Mon, May 6, 2013 at 1:33 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]
> >wrote:
>
> > You can use the Export MR job provided with HBase, it lets you set a time
> > range: http://hbase.apache.org/book.html#export
> >
> > J-D
> >
> >
> > On Mon, May 6, 2013 at 10:27 AM, Gaurav Pandit <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Hbase users,
> > >
> > > We have a use case where we need to know how data looked at a given
> time
> > in
> > > past.
> > >
> > > The data is stored in HBase of course, with multiple versions. And, the
> > > goal is to be able to extractall records (rowkey, columns) as of a
> given
> > > timestamp, to a file.
> > >
> > >
> > > I am trying to figure out the best way to achieve this.
> > >
> > > The options I know are:
> > > 1. Write a *Java* client using HBase Java API, and scan the hbase
> table.
> > > 2. Do the same, but over *Thrift* HBase API using Perl (since
> > > our environment is mostly Perl).
> > > 3. Use *Hive *to point to HBase table, and use Sqoop to extract data
> from
> > > the Hive table and onto client / RDBMS.
> > > 4. Use *Pig *to extract data from HBase table and dump it on HDFS and
> > move
> > > the file over to the client.
> > >
> > > So far, I have successfully implemented option (2). I am still running
> > some
> > > tests to see how it performs, but it works fine as such.
> > >
> > > My questions are:
> > > 1. Is option (3) or (4) even possible? I am not sure if we can access
> the
> > > table for a given timestamp over Pig or Hive.
> > > 2. Is there any other better way of achieving this?
> > >
> > >
> > > Thanks!
> > > Gaurav
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB