Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Too large class path for map reduce jobs


Copy link to this message
-
Re: Too large class path for map reduce jobs
[sent too soon]

The first CP shown is how it is today the CP of a task. If we change it pick
up all the job JARs from the current dir, then the classpath will be much
shorter (second CP shown). We can easily achieve this by soft-linking the
job JARs in the work dir of the task.

Alejandro

On Thu, Oct 7, 2010 at 1:02 PM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:

> Fragmentation of Hadoop classpaths is another issue: hadoop should
> differentiate the CP in 3:
>
> 1*client CP: what is needed to submit a job (only the nachos)
> 2*server CP (JT/NN/TT/DD): what is need to run the cluster (the whole
> enchilada)
> 3*job CP: what is needed to run a job (some of the enchilada)
>
> But i'm not trying to get into that here. What I'm suggesting is:
>
>
> -----
> # Hadoop JARs:
>
> /Users/tucu/dev-apps/hadoop/conf
> /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/lib/tools.jar
> /Users/tucu/dev-apps/hadoop/bin/..
> /Users/tucu/dev-apps/hadoop/bin/../hadoop-core-0.20.3-CDH3-SNAPSHOT.jar
> /Users/tucu/dev-apps/hadoop/bin/../lib/aspectjrt-1.6.5.jar
>
> ..... (about 30 jars from hadoop lib/ )
>
> /Users/tucu/dev-apps/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar
>
> # Job JARs (for a job with only 2 JARs):
>
>
> /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/-2707763075630339038_639898034_1993697040/localhost/user/tucu/oozie-tucu/0000003-101004184132247-oozie-tucu-W/java-node--java/java-launcher.jar
>
> /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/distcache/3613772770922728555_-588832047_1993624983/localhost/user/tucu/examples/apps/java-main/lib/oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
>
> /Users/tucu/dev-apps/hadoop/dirs/mapred/taskTracker/tucu/jobcache/job_201010041326_0058/attempt_201010041326_0058_m_000000_0/work
> -----
>
>
> What I'm suggesting is that the later group, the job JARs to be soft-linked
> (by the TT) into the working directory, then their classpath is just:
>
> -----
> java-launcher.jar
> oozie-examples-2.2.1-CDH3B3-SNAPSHOT.jar
> .
> -----
>
>
> Alejandro
>
> On Wed, Oct 6, 2010 at 7:57 PM, Henning Blohm <[EMAIL PROTECTED]>wrote:
>
>>  Hi Alejandro,
>>
>>    yes, it can of course be done right (sorry if my wording seemed to
>> imply otherwise). Just saying that I think that Hadoop M/R should not go
>> into that class loader / module separation business. It's one Job, one VM,
>> right? So the problem is to assign just the stuff needed to let the Job do
>> its business without becoming an obstacle.
>>
>>   Must admit I didn't understand your proposal 2. How would that remove
>> (e.g.) jetty libs from the job's classpath?
>>
>> Thanks,
>>   Henning
>>
>> Am Mittwoch, den 06.10.2010, 18:28 +0800 schrieb Alejandro Abdelnur:
>>
>>  1. Classloader business can be done right. Actually it could be done as
>> spec-ed for servlet web-apps.
>>
>>
>>
>>  2. If the issue is strictly 'too large classpath', then a simpler
>> solution would be to sof-link all JARs to the current directory and create
>> the classpath with the JAR names only (no path). Note that the soft-linking
>> business is already supported by the DistributedCache. So the changes would
>> be mostly in the TT to create the JAR names only classpath before starting
>> the child.
>>
>>
>>
>>  Alejandro
>>
>>
>>
>>  On Wed, Oct 6, 2010 at 5:57 PM, Henning Blohm <[EMAIL PROTECTED]>
>> wrote:
>>
>>  Hi Tom,
>>
>>   that's exactly it. Thanks! I don't think that I can comment on the
>> issues in Jira so I will do it here.
>>
>>   Tricking with class paths and deviating from the default class loading
>> delegation has never been anything but a short term relieve. Fixing things
>> by imposing a "better" order of stuff on the class path will not work when
>> people do actually use child loaders (as the parent win) - like we do. Also
>> it may easily lead to very confusing situations because the former part of
>> the class path is not complete and gets other stuff from a latter part etc.
>> etc.... no good.
>>
>>   Child loaders are good for module separation but should not be used to
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB