Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> Using merge join from a HBaseStorage


Copy link to this message
-
Re: Using merge join from a HBaseStorage
Hello,

I've made a patch for this issue. You can find all infos about that here :
https://issues.apache.org/jira/browse/PIG-2495

Kevin
Capptain.com - Pilot your apps!

2012/1/24 Kevin Lion <[EMAIL PROTECTED]>

> Hi,
>
> To increase performance of my computation, I would like to use a merge
> join between two tables. I wrote this code to do that :
>
> pigServer.registerQuery("start_sessions = LOAD
>> hbase://startSession.localhost USING HBaseStorage('meta:sid meta:infoid
>> meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
>> imei:chararray, start:long);");
>> pigServer.registerQuery("end_sessions = LOAD hbase://endSession.localhost
>> USING HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray,
>> end:long, locid:chararray);");
>> pigServer.registerQuery("sessions = JOIN start_sessions BY sid,
>> end_sessions BY sid USING 'merge';");
>> pigServer.deleteFile(Analytics.getOutputFilePath("sessions"));
>> pigServer.store("sessions", Analytics.getOutputFilePath("sessions"),
>> "BinStorage");
>>
>
>
> During the first job of the computation, I've got this error :
>
> java.io.IOException: Could create instance of class
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to
>> de-serialize it. (no default constructor ?)
>>     at
>> org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:235)
>>     at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:336)
>>     at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>>     at
>> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
>>     at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>>     at
>> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
>>     at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>>     at
>> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>>     at
>> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:113)
>>     at
>> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>>     at
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> Caused by: java.lang.InstantiationException:
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage$1
>>     at java.lang.Class.newInstance0(Class.java:340)
>>     at java.lang.Class.newInstance(Class.java:308)
>>     at
>> org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:231)
>>     ... 13 more
>>
>
> I have no idea about the origin of the problem. Does someone have any idea
> ?
>
> Thanks,
> Kevin
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB