Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Storing Pig output into HBase tables


Copy link to this message
-
Re: Storing Pig output into HBase tables
Thanks Alan,

I also definitively needs this functionality, and I plan to write it
soon. I was actually on the process of doing what you explained, but
  I was blocked on the best way to specify the name of the HBase
table where to store the data (and also the associated storage
schema) using the "store A into B using C;" paradigm. Do you have
any recommendation about that ?

Alan Gates a �crit :
> In order to store information in HBase, you will need to use an
> OutputFormat that is HBase compatible.  There exists a TableOutputFormat
> in Hbase that will write data.  The trick is to get Pig to use that
> OutputFormat.  It is possible, but Pig does not yet do a good job of
> making it easy.
>
> You will need to write a StoreFunc that returns TableOuputFormat from
> getStoragePreparationClass.  You will then need to have the putNext call
> in StoreFunc write to TableOutputFormat's RecordWriter.  For an example
> of how to do this, see
> contrib/zebra/src/java/org/apache/hadoop/zebra/pig/TableStorer.java in
> Pig's contrib directory.
>
> Alan.
>
> On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
>
>> Hi, Alan. I am interest in this store function, could you mind sending
>> me some details?
>>
>> --------------------------------------------------
>> From: "Alan Gates" <[EMAIL PROTECTED]>
>> Sent: Thursday, September 10, 2009 4:34 AM
>> To: <[EMAIL PROTECTED]>
>> Subject: Re: Storing Pig output into HBase tables
>>
>>> I do not know if there is a general hbase load/import tool.  That  
>>> would be a good question for the hbase-user list.
>>>
>>> Right now Pig does not have a store function to write data into  
>>> hbase. It is possible to write such a function.  If you are  
>>> interested I can send you specific details on how to do it.
>>>
>>> Alan.
>>>
>>> On Aug 19, 2009, at 12:49 PM, Nikhil Gupta wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am working no building a analytics kind of engine which takes  
>>>> daily server
>>>> logs, crunches the data using Pig scripts and (for now) outputs
>>>> data  to
>>>> HDFS. Later, this data is to be stored on HBase to enable efficient
>>>> querying
>>>> from front-end.
>>>>
>>>> Currently, I am searching for efficient ways of moving the Pig  
>>>> output on
>>>> HDFS to the HBase tables. Though this seems to be a very basic
>>>> task,  I could
>>>> not find any easy way of doing that, except for writing some Java  
>>>> code. The
>>>> problem is I'll have many different kind of output formats, and  
>>>> writing java
>>>> code for loading each such file seems wrong. Probably I am missing
>>>> something.
>>>>
>>>> Is there any way of storing Pig output directly in a Hbase table
>>>> [loading is
>>>> possible by HBaseStorage, but that doesn't talk of storing]. Or is  
>>>> there any
>>>> general data load/import tool for Hbase?
>>>>
>>>> Thanks!
>>>> Nikhil Gupta
>>>> Graduate Student,
>>>> Stanford University
>>>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB