Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # user - Writing to HBase table from Pig script


+
Byte Array 2013-03-11, 11:29
+
yonghu 2013-03-11, 12:04
+
Byte Array 2013-03-11, 12:09
Copy link to this message
-
Re: Writing to HBase table from Pig script
Bill Graham 2013-03-11, 17:03
Store is an operator that doesn't get assigned to a relation. Instead of
this:

copy = store results into 'hbase://results' using
org.apache.pig.backend.hadoop.**hbase.HBaseStorage('cf:res1, cf:res2');

try this:

store results into 'hbase://results' using
org.apache.pig.backend.hadoop.**hbase.HBaseStorage('cf:res1,
cf:res2');

On Mon, Mar 11, 2013 at 5:09 AM, Byte Array <[EMAIL PROTECTED]> wrote:

> I use HBase 0.94.4
>
>
>
> On 03/11/2013 01:04 PM, yonghu wrote:
>
>> What HBase version do you use?
>>
>> On Mon, Mar 11, 2013 at 12:29 PM, Byte Array <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hello!
>>>
>>> I successfully read from HBase table using:
>>>
>>> table = load 'hbase://temp' using
>>> org.apache.pig.backend.hadoop.**hbase.HBaseStorage('cf:c1, cf:c2',
>>> '-loadKey
>>> true') as (key:chararray, c1:bytearray, c2:bytearray)
>>>
>>> I used UDF to parse column data and convert it into doubles from
>>> bytearrays.
>>>
>>> I do some processing and manage to dump the results:
>>> dump results;
>>>
>>> which prints:
>>> ((product1-20131231-20100101,**1.5,1.5))
>>> ((product2-20131231-20100101,**2.5,2.5))
>>>
>>> However, I cannot write these results into a newly created empty HBase
>>> table:
>>> copy = store results into 'hbase://results' using
>>> org.apache.pig.backend.hadoop.**hbase.HBaseStorage('cf:res1, cf:res2');
>>>
>>> I have also tried .. store results into 'results' using .., but it
>>> doesn't
>>> help.
>>> I am using pig-0.11.0.
>>>
>>> I suspect I should do some sort of casting into bytearrays using UDF,
>>> like I
>>> did when reading the table.
>>>
>>> This is the exception I get:
>>> java.io.IOException: java.lang.**IllegalArgumentException: No columns to
>>> insert
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> PigGenericMapReduce$Reduce.**runPipeline(**PigGenericMapReduce.java:470)
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> PigGenericMapReduce$Reduce.**processOnePackageOutput(**
>>> PigGenericMapReduce.java:433)
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> PigGenericMapReduce$Reduce.**reduce(PigGenericMapReduce.**java:413)
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> PigGenericMapReduce$Reduce.**reduce(PigGenericMapReduce.**java:257)
>>>      at org.apache.hadoop.mapreduce.**Reducer.run(Reducer.java:176)
>>>      at
>>> org.apache.hadoop.mapred.**ReduceTask.runNewReducer(**
>>> ReduceTask.java:650)
>>>      at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:418)
>>>      at
>>> org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
>>> LocalJobRunner.java:260)
>>> Caused by: java.lang.**IllegalArgumentException: No columns to insert
>>>      at org.apache.hadoop.hbase.**client.HTable.validatePut(**
>>> HTable.java:970)
>>>      at org.apache.hadoop.hbase.**client.HTable.doPut(HTable.**java:763)
>>>      at org.apache.hadoop.hbase.**client.HTable.put(HTable.java:**749)
>>>      at
>>> org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
>>> TableRecordWriter.write(**TableOutputFormat.java:123)
>>>      at
>>> org.apache.hadoop.hbase.**mapreduce.TableOutputFormat$**
>>> TableRecordWriter.write(**TableOutputFormat.java:84)
>>>      at
>>> org.apache.pig.backend.hadoop.**hbase.HBaseStorage.putNext(**
>>> HBaseStorage.java:885)
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> PigOutputFormat$**PigRecordWriter.write(**PigOutputFormat.java:139)
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
>>> PigOutputFormat$**PigRecordWriter.write(**PigOutputFormat.java:98)
>>>      at
>>> org.apache.hadoop.mapred.**ReduceTask$**NewTrackingRecordWriter.write(**
>>> ReduceTask.java:588)
>>>      at
>>> org.apache.hadoop.mapreduce.**TaskInputOutputContext.write(**
>>> TaskInputOutputContext.java:**80)
>>>      at
>>> org.apache.pig.backend.hadoop.**executionengine.**mapReduceLayer.**
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*