Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Running terasort with 1 map task


Copy link to this message
-
Re: Running terasort with 1 map task
Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker?  You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.

Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).

Julien
2013/2/26 Bertrand Dechoux <[EMAIL PROTECTED]>

> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> [EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ===========================>> Kind    Total    Successful    Failed    Killed    StartTime    FinishTime
>>
>> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
>> 10:57:55 (8sec)
>> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
>> 11:08:36 (4sec)
>> ===========================>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>