Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> Using merge join from a HBaseStorage


Copy link to this message
-
Using merge join from a HBaseStorage
Hi,

To increase performance of my computation, I would like to use a merge join
between two tables. I wrote this code to do that :

pigServer.registerQuery("start_sessions = LOAD
> hbase://startSession.localhost USING HBaseStorage('meta:sid meta:infoid
> meta:imei meta:timestamp') AS (sid:chararray, infoid:chararray,
> imei:chararray, start:long);");
> pigServer.registerQuery("end_sessions = LOAD hbase://endSession.localhost
> USING HBaseStorage('meta:sid meta:timestamp meta:locid') AS (sid:chararray,
> end:long, locid:chararray);");
> pigServer.registerQuery("sessions = JOIN start_sessions BY sid,
> end_sessions BY sid USING 'merge';");
> pigServer.deleteFile(Analytics.getOutputFilePath("sessions"));
> pigServer.store("sessions", Analytics.getOutputFilePath("sessions"),
> "BinStorage");
>
During the first job of the computation, I've got this error :

java.io.IOException: Could create instance of class
> org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to
> de-serialize it. (no default constructor ?)
>     at
> org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:235)
>     at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:336)
>     at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
>     at
> org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
>     at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
>     at
> org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
>     at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
>     at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
>     at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:113)
>     at
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.InstantiationException:
> org.apache.pig.backend.hadoop.hbase.HBaseStorage$1
>     at java.lang.Class.newInstance0(Class.java:340)
>     at java.lang.Class.newInstance(Class.java:308)
>     at
> org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:231)
>     ... 13 more
>

I have no idea about the origin of the problem. Does someone have any idea ?

Thanks,
Kevin
+
Kevin Lion 2012-03-06, 13:58
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB