Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Large startup time in remote MapReduce job


Copy link to this message
-
Re: Large startup time in remote MapReduce job

On Jun 21, 2011, at 1:31 PM, Harsh J wrote:

> Gabor,
>
> If your jar does not contain code changes that need to get transmitted
> every time, you can consider placing them on the JT/TT classpaths

... which means you get to bounce your system every time you change code.
> and
> not do any jar registration in your job submission code. You'll see a
> related WARN but it should be OK to ignore that.
>
> If not, work on other ways to get your jar size reduced. Does it
> really contain 20 MB worth of user code or is that with libraries?

Harsh is on the right track.

Break your jar up into multiple chunks, putting the fairly static pieces into a distributed cache.  See http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F for more info.