-Re: Large startup time in remote MapReduce job
Allen Wittenauer 2011-06-21, 20:58
On Jun 21, 2011, at 1:31 PM, Harsh J wrote:
> If your jar does not contain code changes that need to get transmitted
> every time, you can consider placing them on the JT/TT classpaths
... which means you get to bounce your system every time you change code.
> not do any jar registration in your job submission code. You'll see a
> related WARN but it should be OK to ignore that.
> If not, work on other ways to get your jar size reduced. Does it
> really contain 20 MB worth of user code or is that with libraries?
Harsh is on the right track.
Break your jar up into multiple chunks, putting the fairly static pieces into a distributed cache. See http://wiki.apache.org/hadoop/FAQ#How_do_I_submit_extra_content_.28jars.2C_static_files.2C_etc.29_for_my_job_to_use_during_runtime.3F for more info.