Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: M/R job optimization


Copy link to this message
-
Re: M/R job optimization
I do not think the hint of skewed reducer is the problem here as Han
mentioned that he has to wait for 5 minutes after the job shows progress as
100% map and 100% reduce. There may be something to do with the output
committer , FileOutputCommitter needs to be looked at as what its doing for
5 min. Why so much time taken for committing a job.

Thanks,
Rahul
On Mon, Apr 29, 2013 at 9:29 PM, Ted Xu <[EMAIL PROTECTED]> wrote:

> Hi Han,
>
> I think your point is valid. In fact you can change the progress report
> logic by manually calling the Reporter API, but by default it is quite
> straight forward. Reducer progress is divided into 3 phases, namely copy
> phase, merge/sort phase and reduce phase, each with ~33%. In your case it
> looks your program is stucked in reduce phase. To better track the cause,
> you can check the task log, as Ted Dunning suggested before.
>
>
> On Mon, Apr 29, 2013 at 11:17 PM, Han JU <[EMAIL PROTECTED]> wrote:
>
>> Thanks Ted and .. Ted ..
>> I've been looking at the progress when the job is executing.
>> In fact, I think it's not a skewed partition problem. I've looked at the
>> mapper output files, all are of the same size and the reducer each takes a
>> single group.
>> What I want to know is that how hadoop M/R framework calculate the
>> progress percentage.
>> For example, my reducer:
>>
>> reducer(...) {
>>   call_of_another_func() // lots of complicated calculations
>> }
>>
>> Will the percentage reflect the calculation inside the function call?
>> Because I observed that in the job, all reducer reached 100% fairly
>> quickly, then they stucked there. In this time, the datanodes seem to be
>> working.
>>
>> Thanks.
>>
>>
>> 2013/4/26 Ted Dunning <[EMAIL PROTECTED]>
>>
>>> Have you checked the logs?
>>>
>>> Is there a task that is taking a long time?  What is that task doing?
>>>
>>> There are two basic possibilities:
>>>
>>> a) you have a skewed join like the other Ted mentioned.  In this case,
>>> the straggler will be seen to be working on data.
>>>
>>> b) you have a hung process.  This can be more difficult to diagnose, but
>>> indicates that there is a problem with your cluster.
>>>
>>>
>>>
>>> On Fri, Apr 26, 2013 at 2:21 AM, Han JU <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
>>>> questionis that in one of the jobs, map and reduce tasks show 100% finished
>>>> in about 1m 30s, but I have to wait another 5m for this job to finish.
>>>> This job writes about 720mb compressed data to HDFS with replication
>>>> factor 1, in sequence file format. I've tried copying these data to hdfs,
>>>> it takes only < 20 seconds. What happened during this 5 more minutes?
>>>>
>>>> Any idea on how to optimize this part?
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> *JU Han*
>>>>
>>>> UTC   -  Université de Technologie de Compiègne
>>>> *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>> +33 0619608888
>>>>
>>>
>>>
>>
>>
>> --
>> *JU Han*
>>
>> Software Engineer Intern @ KXEN Inc.
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>
>
> --
> Regards,
> Ted Xu
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB