Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Too large class path for map reduce jobs


Copy link to this message
-
Re: Too large class path for map reduce jobs
well, if the issue is a too long classpath, the softlink thingy will give
some room to breath as the total CP length will be much smaller.

A

On Thu, Oct 7, 2010 at 3:43 PM, Henning Blohm <[EMAIL PROTECTED]>wrote:

>  So that's actually another issue, right? Besides splitting the classpath
> into those three groups, you want the TT to create soft-links on demand to
> simplify the computation of classpath string. Is that right?
>
> But it's the TT that actually starts the job VM. Why does it matter what
> the string actually looks like, as long as it has the right content?
>
> Thanks,
>   Henning
>
>
> On Thu, 2010-10-07 at 13:22 +0800, Alejandro Abdelnur wrote:
>
> [sent too soon]
>
>
>
>  The first CP shown is how it is today the CP of a task. If we change it
> pick up all the job JARs from the current dir, then the classpath will be
> much shorter (second CP shown). We can easily achieve this by soft-linking
> the job JARs in the work dir of the task.
>
>
>
>  Alejandro
>
>  On Thu, Oct 7, 2010 at 1:02 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>
> wrote:
>
> Fragmentation of Hadoop classpaths is another issue: hadoop should
> differentiate the CP in 3:
>
>
>
>   1*client CP: what is needed to submit a job (only the nachos)
>
>   2*server CP (JT/NN/TT/DD): what is need to run the cluster (the whole
> enchilada)
>
>   3*job CP: what is needed to run a job (some of the enchilada)
>
>
>   But i'm not trying to get into that here. What I'm suggesting is:
>
>
>
>
>
>   -----
>
>   # Hadoop JARs:
>
>
>
>   /Users/tucu/dev-apps/hadoop/conf
>
>  /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/lib/tools.jar
>
>   /Users/tucu/dev-apps/hadoop/bin/..
>
>   /Users/tucu/dev-apps/hadoop/bin/../hadoop-core-0.20.3-CDH3-SNAPSHOT.jar
>
>   /Users/tucu/dev-apps/hadoop/bin/../lib/aspectjrt-1.6.5.jar
>
>
>
>   ..... (about 30 jars from hadoop lib/ )
>
>
>
>   /Users/tucu/dev-apps/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>
>
>
>   # Job JARs (for a job with only 2 JARs):
>
>
>
>   /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/-2707763075630339038_639898034_1993697040/localhost/user/tucu/oozie-tucu/0000003-101004184132247-oozie-tucu-W/java-node--java/java-launcher.jar
>
>
>  /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/3613772770922728555_-588832047_1993624983/localhost/user/tucu/examples/apps/java-main/lib/oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
>
>
>  /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/tucu/jobcache/job_201010041326_0058/attempt_201010041326_0058_m_000000_0/work
>
>
>   -----
>
>
>
>
>
>   What I'm suggesting is that the later group, the job JARs to be
> soft-linked (by the TT) into the working directory, then their classpath is
> just:
>
>
>
>   -----
>
>   java-launcher.jar
>
>   oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
>
>   .
>
>   -----
>
>
>
>
>
>
>   Alejandro
>
>
>
>   On Wed, Oct 6, 2010 at 7:57 PM, Henning Blohm <[EMAIL PROTECTED]>
> wrote:
>
>   Hi Alejandro,
>
>    yes, it can of course be done right (sorry if my wording seemed to imply
> otherwise). Just saying that I think that Hadoop M/R should not go into that
> class loader / module separation business. It's one Job, one VM, right? So
> the problem is to assign just the stuff needed to let the Job do its
> business without becoming an obstacle.
>
>   Must admit I didn't understand your proposal 2. How would that remove
> (e.g.) jetty libs from the job's classpath?
>
> Thanks,
>   Henning
>
> Am Mittwoch, den 06.10.2010, 18:28 +0800 schrieb Alejandro Abdelnur:
>
>
>
>  1. Classloader business can be done right. Actually it could be done as
> spec-ed for servlet web-apps.
>
>
> 2. If the issue is strictly 'too large classpath', then a simpler solution
> would be to sof-link all JARs to the current directory and create the
> classpath with the JAR names only (no path). Note that the soft-linking
> business is already supported by the DistributedCache. So the changes would
> be mostly in the TT to create the JAR names only classpath before starting