Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Too large class path for map reduce jobs


Copy link to this message
-
Re: Too large class path for map reduce jobs
Fragmentation of Hadoop classpaths is another issue: hadoop should
differentiate the CP in 3:

1*client CP: what is needed to submit a job (only the nachos)
2*server CP (JT/NN/TT/DD): what is need to run the cluster (the whole
enchilada)
3*job CP: what is needed to run a job (some of the enchilada)

But i'm not trying to get into that here. What I'm suggesting is:
-----
# Hadoop JARs:

/Users/tucu/dev-apps/hadoop/conf
/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/lib/tools.jar
/Users/tucu/dev-apps/hadoop/bin/..
/Users/tucu/dev-apps/hadoop/bin/../hadoop-core-0.20.3-CDH3-SNAPSHOT.jar
/Users/tucu/dev-apps/hadoop/bin/../lib/aspectjrt-1.6.5.jar

..... (about 30 jars from hadoop lib/ )

/Users/tucu/dev-apps/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar

# Job JARs (for a job with only 2 JARs):

/Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/-2707763075630339038_639898034_1993697040/localhost/user/tucu/oozie-tucu/0000003-101004184132247-oozie-tucu-W/java-node--java/java-launcher.jar
/Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/3613772770922728555_-588832047_1993624983/localhost/user/tucu/examples/apps/java-main/lib/oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
/Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/tucu/jobcache/job_201010041326_0058/attempt_201010041326_0058_m_000000_0/work
-----
What I'm suggesting is that the later group, the job JARs to be soft-linked
(by the TT) into the working directory, then their classpath is just:

-----
java-launcher.jar
oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
.
-----
Alejandro

On Wed, Oct 6, 2010 at 7:57 PM, Henning Blohm <[EMAIL PROTECTED]>wrote:

>  Hi Alejandro,
>
>    yes, it can of course be done right (sorry if my wording seemed to imply
> otherwise). Just saying that I think that Hadoop M/R should not go into that
> class loader / module separation business. It's one Job, one VM, right? So
> the problem is to assign just the stuff needed to let the Job do its
> business without becoming an obstacle.
>
>   Must admit I didn't understand your proposal 2. How would that remove
> (e.g.) jetty libs from the job's classpath?
>
> Thanks,
>   Henning
>
> Am Mittwoch, den 06.10.2010, 18:28 +0800 schrieb Alejandro Abdelnur:
>
>  1. Classloader business can be done right. Actually it could be done as
> spec-ed for servlet web-apps.
>
>
>
>  2. If the issue is strictly 'too large classpath', then a simpler
> solution would be to sof-link all JARs to the current directory and create
> the classpath with the JAR names only (no path). Note that the soft-linking
> business is already supported by the DistributedCache. So the changes would
> be mostly in the TT to create the JAR names only classpath before starting
> the child.
>
>
>
>  Alejandro
>
>
>
>  On Wed, Oct 6, 2010 at 5:57 PM, Henning Blohm <[EMAIL PROTECTED]>
> wrote:
>
>  Hi Tom,
>
>   that's exactly it. Thanks! I don't think that I can comment on the issues
> in Jira so I will do it here.
>
>   Tricking with class paths and deviating from the default class loading
> delegation has never been anything but a short term relieve. Fixing things
> by imposing a "better" order of stuff on the class path will not work when
> people do actually use child loaders (as the parent win) - like we do. Also
> it may easily lead to very confusing situations because the former part of
> the class path is not complete and gets other stuff from a latter part etc.
> etc.... no good.
>
>   Child loaders are good for module separation but should not be used to
> "hide" type visibiliy from the parent. Almost certainly leading to Class
> Loader Contraint Violation - once you lose control (which is usually earlier
> than expected).
>
>   The suggestion to reduce the Job class path to the required minimum is
> the most practical approach. There is some gray area there of course and it
> will not be feasible to reach the absolute minimal set of types there - but
> something reasonable, i.e. the hadoop core that suffices to run the job.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB