Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> M/R job optimization

Copy link to this message
Re: M/R job optimization
Have you checked the logs?

Is there a task that is taking a long time?  What is that task doing?

There are two basic possibilities:

a) you have a skewed join like the other Ted mentioned.  In this case, the
straggler will be seen to be working on data.

b) you have a hung process.  This can be more difficult to diagnose, but
indicates that there is a problem with your cluster.

On Fri, Apr 26, 2013 at 2:21 AM, Han JU <[EMAIL PROTECTED]> wrote:

> Hi,
> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
> questionis that in one of the jobs, map and reduce tasks show 100% finished
> in about 1m 30s, but I have to wait another 5m for this job to finish.
> This job writes about 720mb compressed data to HDFS with replication
> factor 1, in sequence file format. I've tried copying these data to hdfs,
> it takes only < 20 seconds. What happened during this 5 more minutes?
> Any idea on how to optimize this part?
> Thanks.
> --
> *JU Han*
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
> +33 0619608888