Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Extract a whole table for a given time(stamp)


+
Gaurav Pandit 2013-05-06, 17:27
+
Jean-Daniel Cryans 2013-05-06, 17:33
+
Gaurav Pandit 2013-05-06, 17:40
+
Shahab Yunus 2013-05-06, 17:50
+
Gaurav Pandit 2013-05-06, 18:19
Copy link to this message
-
Re: Extract a whole table for a given time(stamp)
You could save some time by using
http://hbase.apache.org/book.html#copytable

J-D
On Mon, May 6, 2013 at 11:19 AM, Gaurav Pandit <[EMAIL PROTECTED]>wrote:

> Thanks for your inputs, J-D, Shahab.
>
> Sorry if I was ambiguous in stating what I wanted to do. Just to restate
> the goal in one line:
> "Extract all rows (with rowkey, columns) from an Hbase table as of a given
> time using HBase timestamp/versions, in a plain text file format"
>
> J-D, we have about 5 millions rows (but each could have multiple versions)
> for now. So I think scanning the whole table is okay for now. But it seems
> it may not be the best option for a big table. Also, as I mentioned
> earlier, I think Hive/Pig does not let you access Hbase for a timestamp. If
> they can do that, it's the approach I wanted to take.
>
> But your suggestion of using *export* got me thinking, and the following
> may work out well:
> 1. Export HBase table for a given timestamp using "*export*" utility .
> 2. Import the file into another "temp" HBase table.
> 3. Use Pig/Hive to extract the table and put it on an HDFS file in plain
> text (or onto an RDBMS).
> 4. Let the client retrieve the file.
>
> Shahab, in my case, I was talking about using internal timestamp. But
> thanks for your input - I was unaware of Pig DBStorage loader! It may come
> handy in some other scenario.
>
>
> Thanks,
> Gaurav
>
>
>
> On Mon, May 6, 2013 at 1:50 PM, Shahab Yunus <[EMAIL PROTECTED]>
> wrote:
>
> > Gaurav, when you say that you want older versions of the data then are
> you
> > talking about filtering on the internal timestamps (and hence the
> internal
> > versioning mechanism) or your data has a separate column (basically using
> > custom versioning) for versioning? If the later then you can use Pig. It
> > can dump your data directly into an RDBMS like MySQL too as a DBStorage
> > loader/store is available.
> >
> > Might not be totally applicable to your issue but just wanted to share a
> > thought.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, May 6, 2013 at 1:40 PM, Gaurav Pandit <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Thanks J-D.
> > >
> > > Wouldn't the export utility export the data in sequence file format? My
> > > goal is to generate data in some sort of delimited plain text file and
> > hand
> > > it over the caller.
> > >
> > > - Gaurav
> > >
> > >
> > > On Mon, May 6, 2013 at 1:33 PM, Jean-Daniel Cryans <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > You can use the Export MR job provided with HBase, it lets you set a
> > time
> > > > range: http://hbase.apache.org/book.html#export
> > > >
> > > > J-D
> > > >
> > > >
> > > > On Mon, May 6, 2013 at 10:27 AM, Gaurav Pandit <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi Hbase users,
> > > > >
> > > > > We have a use case where we need to know how data looked at a given
> > > time
> > > > in
> > > > > past.
> > > > >
> > > > > The data is stored in HBase of course, with multiple versions. And,
> > the
> > > > > goal is to be able to extractall records (rowkey, columns) as of a
> > > given
> > > > > timestamp, to a file.
> > > > >
> > > > >
> > > > > I am trying to figure out the best way to achieve this.
> > > > >
> > > > > The options I know are:
> > > > > 1. Write a *Java* client using HBase Java API, and scan the hbase
> > > table.
> > > > > 2. Do the same, but over *Thrift* HBase API using Perl (since
> > > > > our environment is mostly Perl).
> > > > > 3. Use *Hive *to point to HBase table, and use Sqoop to extract
> data
> > > from
> > > > > the Hive table and onto client / RDBMS.
> > > > > 4. Use *Pig *to extract data from HBase table and dump it on HDFS
> and
> > > > move
> > > > > the file over to the client.
> > > > >
> > > > > So far, I have successfully implemented option (2). I am still
> > running
> > > > some
> > > > > tests to see how it performs, but it works fine as such.
> > > > >
> > > > > My questions are:
> > > > > 1. Is option (3) or (4) even possible? I am not sure if we can
+
Gaurav Pandit 2013-05-06, 18:34
+
Jean-Daniel Cryans 2013-05-06, 17:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB