Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Too large class path for map reduce jobs

Copy link to this message
Re: Too large class path for map reduce jobs
Fragmentation of Hadoop classpaths is another issue: hadoop should
differentiate the CP in 3:

1*client CP: what is needed to submit a job (only the nachos)
2*server CP (JT/NN/TT/DD): what is need to run the cluster (the whole
3*job CP: what is needed to run a job (some of the enchilada)

But i'm not trying to get into that here. What I'm suggesting is:
# Hadoop JARs:


..... (about 30 jars from hadoop lib/ )


# Job JARs (for a job with only 2 JARs):

What I'm suggesting is that the later group, the job JARs to be soft-linked
(by the TT) into the working directory, then their classpath is just:


On Wed, Oct 6, 2010 at 7:57 PM, Henning Blohm <[EMAIL PROTECTED]>wrote:

>  Hi Alejandro,
>    yes, it can of course be done right (sorry if my wording seemed to imply
> otherwise). Just saying that I think that Hadoop M/R should not go into that
> class loader / module separation business. It's one Job, one VM, right? So
> the problem is to assign just the stuff needed to let the Job do its
> business without becoming an obstacle.
>   Must admit I didn't understand your proposal 2. How would that remove
> (e.g.) jetty libs from the job's classpath?
> Thanks,
>   Henning
> Am Mittwoch, den 06.10.2010, 18:28 +0800 schrieb Alejandro Abdelnur:
>  1. Classloader business can be done right. Actually it could be done as
> spec-ed for servlet web-apps.
>  2. If the issue is strictly 'too large classpath', then a simpler
> solution would be to sof-link all JARs to the current directory and create
> the classpath with the JAR names only (no path). Note that the soft-linking
> business is already supported by the DistributedCache. So the changes would
> be mostly in the TT to create the JAR names only classpath before starting
> the child.
>  Alejandro
>  On Wed, Oct 6, 2010 at 5:57 PM, Henning Blohm <[EMAIL PROTECTED]>
> wrote:
>  Hi Tom,
>   that's exactly it. Thanks! I don't think that I can comment on the issues
> in Jira so I will do it here.
>   Tricking with class paths and deviating from the default class loading
> delegation has never been anything but a short term relieve. Fixing things
> by imposing a "better" order of stuff on the class path will not work when
> people do actually use child loaders (as the parent win) - like we do. Also
> it may easily lead to very confusing situations because the former part of
> the class path is not complete and gets other stuff from a latter part etc.
> etc.... no good.
>   Child loaders are good for module separation but should not be used to
> "hide" type visibiliy from the parent. Almost certainly leading to Class
> Loader Contraint Violation - once you lose control (which is usually earlier
> than expected).
>   The suggestion to reduce the Job class path to the required minimum is
> the most practical approach. There is some gray area there of course and it
> will not be feasible to reach the absolute minimal set of types there - but
> something reasonable, i.e. the hadoop core that suffices to run the job.