Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> Problems setting custom class for the Pluggable Sort in MapReduce Next Generation


Copy link to this message
-
Problems setting custom class for the Pluggable Sort in MapReduce Next Generation
Hello,

I'm developing a custom map output buffer which uses replacement selection
instead of quicksort. It's available
here<https://bitbucket.org/pmdusso/hadoop-replacement-selection-sort/overview>.
It is based on the new pluggable interface from the JIRA number
2454<https://issues.apache.org/jira/browse/MAPREDUCE-2454>
.

I've been testing it in a single-node installation with success. I
configure the job during its creation like this:

*  conf.set("io.serializations",
"io.serialization.WritableSerializationWithZeroEndingText");*
* conf.set("mapreduce.job.map.output.collector.class",
"pluggable.MapOutputHeapWithMetadataHeap");*

I used to generate a runnable jar and run it normally as java -jar ...  But
now I would like to try it in a multinode cluster (which is working with
normal jobs). I remove this hardcoded configuration and start calling the
jar like:

*hadoop jar jars/wordCount.jar
-Dmapreduce.task.io.sort.mb=16
-Dmapreduce.job.map.output.collector.class=pluggable.*
*MapOutputHeapWithMetadataHeap**
-Dio.serializations=io.serialization.WritableSerializationWithZeroEndingText
/wordcount/words /wordcount/output/out*

But I can't get this to work. I keep getting a ClassNotFoundException:

Error: java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
pluggable.MapOutputHeapWithMetadataHeap.class not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:383)
 at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class pluggable.MapOutputHeapWithMetadataHeap.class not found
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
 ... 10 more
Caused by: java.lang.ClassNotFoundException: Class
pluggable.MapOutputHeapWithMetadataHeap.class not found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
 at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 11 more

I have two projects: one for jobs like wordCount, grep, etc. and one where
I'm developing my custom output buffer (the one in the bitbucket linked
above). Because of this, I tried different jar configurations:

   - Project jobs having a *project* dependency in Eclipse. Export runnable
   jar with packaged required libraries and also copied as a folder
   - Project jobs adds a jar generate from custom output buffer project
   - Fat jar generated with mvn in project jobs.
All of those failed. I would appreciate any help, since it seems to have
very few information about this online. If I'm missing some important
information, please let know I will bring it.

Best regards,

Pedro Martins Dusso

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB