Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Using merge join from a HBaseStorage


Copy link to this message
-
Using merge join from a HBaseStorage
Kevin Lion 2012-01-24, 16:58
Hi,

To increase performance of my computation, I would like to use a merge join
between two tables. I wrote this code to do that :

pigServer.registerQuery("start_sessions = LOAD
> hbase://startSession.localhost USING HBaseStorage('meta:sid meta:infoid
> meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
> imei:chararray, start:long);");
> pigServer.registerQuery("end_sessions = LOAD hbase://endSession.localhost
> USING HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray,
> end:long, locid:chararray);");
> pigServer.registerQuery("sessions = JOIN start_sessions BY sid,
> end_sessions BY sid USING 'merge';");
> pigServer.deleteFile(Analytics.getOutputFilePath("sessions"));
> pigServer.store("sessions", Analytics.getOutputFilePath("sessions"),
> "BinStorage");
>
During the first job of the computation, I've got this error :

java.io.IOException: Could create instance of class
> org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to
> de-serialize it. (no default constructor ?)
>     at
> org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:235)
>     at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:336)
>     at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>     at
> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
>     at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>     at
> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
>     at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>     at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>     at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:113)
>     at
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.InstantiationException:
> org.apache.pig.backend.hadoop.hbase.HBaseStorage$1
>     at java.lang.Class.newInstance0(Class.java:340)
>     at java.lang.Class.newInstance(Class.java:308)
>     at
> org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:231)
>     ... 13 more
>

I have no idea about the origin of the problem. Does someone have any idea ?

Thanks,
Kevin