Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Running terasort with 1 map task


Copy link to this message
-
Re: Running terasort with 1 map task
Maybe your goal is to have a baseline for performance measurement?
In that case, you might want to consider running only one taskTracker?  You
would have multiple tasks but running on only 1 machine. Also, you could
make mappers run serially, by configuring only one map slot on your 1 node
cluster.

Nevertheless I agree with Bertrand, this is not really a realistic use case
(or maybe you can give us more clues).

Julien
2013/2/26 Bertrand Dechoux <[EMAIL PROTECTED]>

> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> It is possible to have a single mapper if the input is not splittable BUT
> it is rarely seen as a feature.
> One could ask why you want to use a platform for distributed computing for
> a job that shouldn't be distributed.
>
> Regards
>
> Bertrand
>
>
>
> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
> [EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>> I am trying to run terasort using one map and one reduce. so, I generated
>> the input data using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>
>> Then I launched the hadoop terasort job using:
>>
>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>
>> I thought it will run the job using 1 map and 1 reduce, but when inspect
>> the job statistics I found:
>>
>> hadoop job -history /user/hadoop/output1
>>
>> Task Summary
>> ===========================>> Kind    Total    Successful    Failed    Killed    StartTime    FinishTime
>>
>> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
>> 10:57:55 (8sec)
>> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
>> 11:05:37 (7mins, 40sec)
>> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
>> 11:08:31 (10mins, 10sec)
>> Cleanup    1    1        0    0    26-Feb-2013 11:08:32    26-Feb-2013
>> 11:08:36 (4sec)
>> ===========================>>
>> so, though I mentioned to launch one map tasks, there are 24 of them.
>>
>> How to solve this problem. How to tell hadoop to launch only one map.
>>
>> Thanks,
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB