Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> M/R job optimization


+
Han JU 2013-04-26, 09:21
+
Ted Xu 2013-04-26, 09:48
Copy link to this message
-
Re: M/R job optimization
Have you checked the logs?

Is there a task that is taking a long time?  What is that task doing?

There are two basic possibilities:

a) you have a skewed join like the other Ted mentioned.  In this case, the
straggler will be seen to be working on data.

b) you have a hung process.  This can be more difficult to diagnose, but
indicates that there is a problem with your cluster.

On Fri, Apr 26, 2013 at 2:21 AM, Han JU <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
> questionis that in one of the jobs, map and reduce tasks show 100% finished
> in about 1m 30s, but I have to wait another 5m for this job to finish.
> This job writes about 720mb compressed data to HDFS with replication
> factor 1, in sequence file format. I've tried copying these data to hdfs,
> it takes only < 20 seconds. What happened during this 5 more minutes?
>
> Any idea on how to optimize this part?
>
> Thanks.
>
> --
> *JU Han*
>
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>
+
Han JU 2013-04-29, 15:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB