|
|
-
Re: Running terasort with 1 map taskMahesh Balija 2013-02-26, 23:07
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem? On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury < [EMAIL PROTECTED]> wrote: > sorry my bad, it solved > > > On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury < > [EMAIL PROTECTED]> wrote: > >> In my $HADOOP_HOME/conf/hdfs-site.xml, I have mentioned the data-block >> size >> >> <property> >> <name>dfs.block.size</name> >> <value>134217728</value> >> <final>true</final> >> </property> >> >> While running the teragen I am again specifying it to be sure: >> >> hadoop jar /opt/hadoop-1.0.4/hadoop-examples-1.0.4.jar teragen >> -Dmapred.map.tasks=1 -Dmapred.reduce.tasks=1 -Ddfs.block.size=134217728 >> 320000 /user/hadoop/input >> >> but it generates 3 blocks: >> >> hadoop fsck -blocks -files -locations /user/hadoop/input >> Status: HEALTHY >> Total size: 32029543 B >> Total dirs: 3 >> Total files: 4 >> Total blocks (validated): 3 (avg. block size 10676514 B) >> Minimally replicated blocks: 3 (100.0 %) >> >> What I am doing wrong? How can I generate only one block? >> >> >> >> On Tue, Feb 26, 2013 at 12:52 PM, Arindam Choudhury < >> [EMAIL PROTECTED]> wrote: >> >>> Thanks . As Julien said I want to do a performance measurement. >>> >>> Actually, >>> >>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1 >>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map >>> >>> has generated: >>> Total size: 3200029737 B >>> Total dirs: 3 >>> Total files: 5 >>> Total blocks (validated): 27 (avg. block size 118519619 B) >>> >>> Thats why so many maps. >>> >>> >>> On Tue, Feb 26, 2013 at 12:46 PM, Julien Muller <[EMAIL PROTECTED] >>> > wrote: >>> >>>> Maybe your goal is to have a baseline for performance measurement? >>>> In that case, you might want to consider running only one taskTracker? >>>> You would have multiple tasks but running on only 1 machine. Also, you >>>> could make mappers run serially, by configuring only one map slot on your 1 >>>> node cluster. >>>> >>>> Nevertheless I agree with Bertrand, this is not really a realistic use >>>> case (or maybe you can give us more clues). >>>> >>>> Julien >>>> >>>> >>>> 2013/2/26 Bertrand Dechoux <[EMAIL PROTECTED]> >>>> >>>>> http://wiki.apache.org/hadoop/HowManyMapsAndReduces >>>>> >>>>> It is possible to have a single mapper if the input is not splittable >>>>> BUT it is rarely seen as a feature. >>>>> One could ask why you want to use a platform for distributed computing >>>>> for a job that shouldn't be distributed. >>>>> >>>>> Regards >>>>> >>>>> Bertrand >>>>> >>>>> >>>>> >>>>> On Tue, Feb 26, 2013 at 12:09 PM, Arindam Choudhury < >>>>> [EMAIL PROTECTED]> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I am trying to run terasort using one map and one reduce. so, I >>>>>> generated the input data using: >>>>>> >>>>>> hadoop jar hadoop-examples-1.0.4.jar teragen -Dmapred.map.tasks=1 >>>>>> -Dmapred.reduce.tasks=1 32000000 /user/hadoop/input32mb1map >>>>>> >>>>>> Then I launched the hadoop terasort job using: >>>>>> >>>>>> hadoop jar hadoop-examples-1.0.4.jar terasort -Dmapred.map.tasks=1 >>>>>> -Dmapred.reduce.tasks=1 /user/hadoop/input32mb1map /user/hadoop/output1 >>>>>> >>>>>> I thought it will run the job using 1 map and 1 reduce, but when >>>>>> inspect the job statistics I found: >>>>>> >>>>>> hadoop job -history /user/hadoop/output1 >>>>>> >>>>>> Task Summary >>>>>> ===========================>>>>>> Kind Total Successful Failed Killed StartTime >>>>>> FinishTime >>>>>> >>>>>> Setup 1 1 0 0 26-Feb-2013 10:57:47 26-Feb-2013 >>>>>> 10:57:55 (8sec) >>>>>> Map 24 24 0 0 26-Feb-2013 10:57:57 26-Feb-2013 >>>>>> 11:05:37 (7mins, 40sec) >>>>>> Reduce 1 1 0 0 26-Feb-2013 10:58:21 26-Feb-2013 >>>>>> 11:08:31 (10mins, 10sec) >>>>>> Cleanup 1 1 0 0 26-Feb-2013 11:08:32 >>>>>> 26-Feb-2013 11:08:36 (4sec) |