Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Storing Pig output into HBase tables

Copy link to this message
Re: Storing Pig output into HBase tables
Thanks Alan,

I also definitively needs this functionality, and I plan to write it
soon. I was actually on the process of doing what you explained, but
  I was blocked on the best way to specify the name of the HBase
table where to store the data (and also the associated storage
schema) using the "store A into B using C;" paradigm. Do you have
any recommendation about that ?

Alan Gates a �crit :
> In order to store information in HBase, you will need to use an
> OutputFormat that is HBase compatible.  There exists a TableOutputFormat
> in Hbase that will write data.  The trick is to get Pig to use that
> OutputFormat.  It is possible, but Pig does not yet do a good job of
> making it easy.
> You will need to write a StoreFunc that returns TableOuputFormat from
> getStoragePreparationClass.  You will then need to have the putNext call
> in StoreFunc write to TableOutputFormat's RecordWriter.  For an example
> of how to do this, see
> contrib/zebra/src/java/org/apache/hadoop/zebra/pig/TableStorer.java in
> Pig's contrib directory.
> Alan.
> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
>> Hi, Alan. I am interest in this store function, could you mind sending
>> me some details?
>> --------------------------------------------------
>> From: "Alan Gates" <[EMAIL PROTECTED]>
>> Sent: Thursday, September 10, 2009 4:34 AM
>> Subject: Re: Storing Pig output into HBase tables
>>> I do not know if there is a general hbase load/import tool.  That  
>>> would be a good question for the hbase-user list.
>>> Right now Pig does not have a store function to write data into  
>>> hbase. It is possible to write such a function.  If you are  
>>> interested I can send you specific details on how to do it.
>>> Alan.
>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>> Hi all,
>>>> I am working no building a analytics kind of engine which takes  
>>>> daily server
>>>> logs, crunches the data using Pig scripts and (for now) outputs
>>>> data  to
>>>> HDFS. Later, this data is to be stored on HBase to enable efficient
>>>> querying
>>>> from front-end.
>>>> Currently, I am searching for efficient ways of moving the Pig  
>>>> output on
>>>> HDFS to the HBase tables. Though this seems to be a very basic
>>>> task,  I could
>>>> not find any easy way of doing that, except for writing some Java  
>>>> code. The
>>>> problem is I'll have many different kind of output formats, and  
>>>> writing java
>>>> code for loading each such file seems wrong. Probably I am missing
>>>> something.
>>>> Is there any way of storing Pig output directly in a Hbase table
>>>> [loading is
>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is  
>>>> there any
>>>> general data load/import tool for Hbase?
>>>> Thanks!
>>>> Nikhil Gupta
>>>> Graduate Student,
>>>> Stanford University