Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Too large class path for map reduce jobs


Copy link to this message
-
Re: Too large class path for map reduce jobs
Hi Henning,

I don't know if you've seen
https://issues.apache.org/jira/browse/MAPREDUCE-1938 and
https://issues.apache.org/jira/browse/MAPREDUCE-1700 which have
discussion about this issue.

Cheers
Tom

On Fri, Sep 24, 2010 at 3:41 AM, Henning Blohm <[EMAIL PROTECTED]> wrote:
> Short update on the issue:
>
> I tried to find a way to separate class path configurations by modifying the
> scripts in HADOOP_HOME/bin but found that TaskRunner actually copies the
> class path setting from the parent process when starting a local task so
> that I do not see a way of having less on a job's classpath without
> modifying Hadoop.
>
> As that will present a real issue when running our jobs on Hadoop I would
> like to propose to change TaskRunner so that it sets a class path
> specifically for M/R tasks. That class path could be defined in the scipts
> (as for the other processes) using a particular environment variable (e.g.
> HADOOP_JOB_CLASSPATH). It could default to the current VM's class path,
> preserving today's behavior.
>
> Is it ok to enter this as an issue?
>
> Thanks,
>   Henning
>
>
> Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer:
>
> On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
>
>> When running map reduce tasks in Hadoop I run into classpath issues.
>> Contrary to previous posts, my problem is not that I am missing classes on
>> the Task's class path (we have a perfect solution for that) but rather find
>> too many (e.g. ECJ classes or jetty).
>
> The fact that you mention:
>
>> The libs in HADOOP_HOME/lib seem to contain everything needed to run
>> anything in Hadoop which is, I assume, much more than is needed to run a map
>> reduce task.
>
> hints that your perfect solution is to throw all your custom stuff in lib.
> If so, that's a huge mistake.  Use distributed cache instead.
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB