Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - MR job launching is slower


Copy link to this message
-
Re: MR job launching is slower
Michael Segel 2012-03-20, 10:54
Hi,

First, it sounds like you have 2 6 core CPUs giving you 12 cores not 24.
Even though the OS reports 24 cores that's the hyper threading and not real cores.
This becomes an issue with respect to tuning.

To answer your question ...

You have a single 1TB HD. That's going to be a major bottleneck in terms of performance.  You usually want to have at least 1 drive per core.  With a 12 core box that's 12 spindles.

How large is your hadoop job's jar? This gets pushed around to all of the nodes.
Bigger jars take longer to process and handle.

Having said that, the start up time isn't out of whack.
It depends on what job you're launching and what you are doing within the job. Remember that the tasks have to report back to the JT.

Do you have Ganglia up and running?
You will probably see a high load on the CPUs and then a lot of Wait IOs.

HTH

-Mike

On Mar 20, 2012, at 5:40 AM, praveenesh kumar wrote:

> I have 10 node cluster ( around 24 CPUs, 48 GB RAM, 1 TB HDD, 10 GB
> ethernet connection)
> After triggering any MR job, its taking like 3-5 seconds to launch ( I mean
> the time when I can see any MR job completion % on the screen).
> I know internally its trying to launch the job,intialize mappers, loading
> data etc.
> What I want to know - Is it a default/desired/expected hadoop behavior or
> there are ways in which I can decrease this startup time ?
>
> Also I feel like my hadoop jobs should run faster, but I am still not able
> to make it as fast as it should be according to me ?
> I did some tunning also, following are the parameters I am playing around
> these days but still I feel there are something missing that I can still
> use:
>
> dfs.block.size:
>
> mapred.compress.map.output
>
> mapred.map/reduce.tasks.speculative.execution
>
> mapred.tasktracker.map/reduce.tasks.maximum:
>
> mapred.child.java.opts
>
> io.sort.mb:
>
> io.sort.factor:
>
> mapred.reduce.parallel.copies:
>
> mapred.job.reuse.jvm.num.tasks:
>
>
> Thanks,
> Praveenesh