-Re: Storing Pig output into HBase tables
Vincent BARAT 2009-09-19, 17:13
I also definitively needs this functionality, and I plan to write it
soon. I was actually on the process of doing what you explained, but
I was blocked on the best way to specify the name of the HBase
table where to store the data (and also the associated storage
schema) using the "store A into B using C;" paradigm. Do you have
any recommendation about that ?
Alan Gates a ï¿½crit :
> In order to store information in HBase, you will need to use an
> OutputFormat that is HBase compatible. There exists a TableOutputFormat
> in Hbase that will write data. The trick is to get Pig to use that
> OutputFormat. It is possible, but Pig does not yet do a good job of
> making it easy.
> You will need to write a StoreFunc that returns TableOuputFormat from
> getStoragePreparationClass. You will then need to have the putNext call
> in StoreFunc write to TableOutputFormat's RecordWriter. For an example
> of how to do this, see
> contrib/zebra/src/java/org/apache/hadoop/zebra/pig/TableStorer.java in
> Pig's contrib directory.
> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
>> Hi, Alan. I am interest in this store function, could you mind sending
>> me some details?
>> From: "Alan Gates" <[EMAIL PROTECTED]>
>> Sent: Thursday, September 10, 2009 4:34 AM
>> To: <[EMAIL PROTECTED]>
>> Subject: Re: Storing Pig output into HBase tables
>>> I do not know if there is a general hbase load/import tool. That
>>> would be a good question for the hbase-user list.
>>> Right now Pig does not have a store function to write data into
>>> hbase. It is possible to write such a function. If you are
>>> interested I can send you specific details on how to do it.
>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>> Hi all,
>>>> I am working no building a analytics kind of engine which takes
>>>> daily server
>>>> logs, crunches the data using Pig scripts and (for now) outputs
>>>> data to
>>>> HDFS. Later, this data is to be stored on HBase to enable efficient
>>>> from front-end.
>>>> Currently, I am searching for efficient ways of moving the Pig
>>>> output on
>>>> HDFS to the HBase tables. Though this seems to be a very basic
>>>> task, I could
>>>> not find any easy way of doing that, except for writing some Java
>>>> code. The
>>>> problem is I'll have many different kind of output formats, and
>>>> writing java
>>>> code for loading each such file seems wrong. Probably I am missing
>>>> Is there any way of storing Pig output directly in a Hbase table
>>>> [loading is
>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is
>>>> there any
>>>> general data load/import tool for Hbase?
>>>> Nikhil Gupta
>>>> Graduate Student,
>>>> Stanford University