Have you checked the logs?
Is there a task that is taking a long time? What is that task doing?
There are two basic possibilities:
a) you have a skewed join like the other Ted mentioned. In this case, the
straggler will be seen to be working on data.
b) you have a hung process. This can be more difficult to diagnose, but
indicates that there is a problem with your cluster.
On Fri, Apr 26, 2013 at 2:21 AM, Han JU <[EMAIL PROTECTED]> wrote:
> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
> questionis that in one of the jobs, map and reduce tasks show 100% finished
> in about 1m 30s, but I have to wait another 5m for this job to finish.
> This job writes about 720mb compressed data to HDFS with replication
> factor 1, in sequence file format. I've tried copying these data to hdfs,
> it takes only < 20 seconds. What happened during this 5 more minutes?
> Any idea on how to optimize this part?
> *JU Han*
> UTC - Université de Technologie de Compiègne
> * **GI06 - Fouille de Données et Décisionnel*
> +33 0619608888