Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Running terasort with 1 map task


Copy link to this message
-
Re: Running terasort with 1 map task
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?

On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury <
[EMAIL PROTECTED]> wrote:

> sorry my bad, it solved
>
>
> On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
> [EMAIL PROTECTED]> wrote:
>
>> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block
>> size
>>
>> <property>
>>   <name>dfs.block.size</name>
>>   <value>134217728</value>
>>   <final>true</final>
>> </property>
>>
>> While running the teragen I am again specifying it to be sure:
>>
>> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen
>> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728
>> 320000 /user/hadoop/input
>>
>> but it generates 3 blocks:
>>
>> hadoop fsck -blocks -files -locations /user/hadoop/input
>> Status: HEALTHY
>>  Total size:    32029543 B
>>  Total dirs:    3
>>  Total files:    4
>>  Total blocks (validated):    3 (avg. block size 10676514 B)
>>  Minimally replicated blocks:    3 (100.0 %)
>>
>> What I am doing wrong? How can I generate only one block?
>>
>>
>>
>> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Thanks . As Julien said I want to do a performance measurement.
>>>
>>> Actually,
>>>
>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>
>>> has generated:
>>> Total size:    3200029737 B
>>> Total dirs:    3
>>> Total files:    5
>>> Total blocks (validated):    27 (avg. block size 118519619 B)
>>>
>>> Thats why so many maps.
>>>
>>>
>>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <[EMAIL PROTECTED]
>>> > wrote:
>>>
>>>> Maybe your goal is to have a baseline for performance measurement?
>>>> In that case, you might want to consider running only one taskTracker?
>>>>  You would have multiple tasks but running on only 1 machine. Also, you
>>>> could make mappers run serially, by configuring only one map slot on your 1
>>>> node cluster.
>>>>
>>>> Nevertheless I agree with Bertrand, this is not really a realistic use
>>>> case (or maybe you can give us more clues).
>>>>
>>>> Julien
>>>>
>>>>
>>>> 2013/2/26 Bertrand Dechoux <[EMAIL PROTECTED]>
>>>>
>>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>>>>>
>>>>> It is possible to have a single mapper if the input is not splittable
>>>>> BUT it is rarely seen as a feature.
>>>>> One could ask why you want to use a platform for distributed computing
>>>>> for a job that shouldn't be distributed.
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run terasort using one map and one reduce. so, I
>>>>>> generated the input data using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map
>>>>>>
>>>>>> Then I launched the hadoop terasort job using:
>>>>>>
>>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1
>>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1
>>>>>>
>>>>>> I thought it will run the job using 1 map and 1 reduce, but when
>>>>>> inspect the job statistics I found:
>>>>>>
>>>>>> hadoop job -history /user/hadoop/output1
>>>>>>
>>>>>> Task Summary
>>>>>> ===========================>>>>>> Kind    Total    Successful    Failed    Killed    StartTime
>>>>>> FinishTime
>>>>>>
>>>>>> Setup    1    1        0    0    26-Feb-2013 10:57:47    26-Feb-2013
>>>>>> 10:57:55 (8sec)
>>>>>> Map    24    24        0    0    26-Feb-2013 10:57:57    26-Feb-2013
>>>>>> 11:05:37 (7mins, 40sec)
>>>>>> Reduce    1    1        0    0    26-Feb-2013 10:58:21    26-Feb-2013
>>>>>> 11:08:31 (10mins, 10sec)
>>>>>> Cleanup    1    1        0    0    26-Feb-2013 11:08:32
>>>>>> 26-Feb-2013 11:08:36 (4sec)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB