Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> FW: NNbench and MRBench


Copy link to this message
-
Re: FW: NNbench and MRBench
El 5/8/2011 12:46 AM, [EMAIL PROTECTED] escribió:
> Thanks Marcos.
> This post of  Michael Noll does provide some information about how to run these benchmarks, but there's not much information about how to evaluate the results.
> Do you know some resources about the result analysis?
>
> Thanks very much :)
>
> Regards,
> Stanley
>
> -----Original Message-----
> From: Marcos Ortiz [mailto:[EMAIL PROTECTED]]
> Sent: 2011年5月8日 11:09
> To: [EMAIL PROTECTED]
> Cc: Shi, Stanley
> Subject: Re: FW: NNbench and MRBench
>
> El 5/7/2011 10:33 PM, [EMAIL PROTECTED] escribió:
>    
>> Thanks, Marcos,
>>
>> Through these links, I still can't find anything about the NNbench and MRBench.
>>
>> -----Original Message-----
>> From: Marcos Ortiz [mailto:[EMAIL PROTECTED]]
>> Sent: 2011年5月8日 10:23
>> To: [EMAIL PROTECTED]
>> Cc: Shi, Stanley
>> Subject: Re: FW: NNbench and MRBench
>>
>> El 5/7/2011 8:53 PM, [EMAIL PROTECTED] escribió:
>>
>>      
>>> Hi guys,
>>>
>>> I have a cluster of 16 machines running Hadoop. Now I want to do some benchmark on this cluster with the "nnbench" and "mrbench".
>>> I'm new to the hadoop thing and have no one to refer to. I don't know what the supposed result should I have?
>>> Now for mrbench, I have an average time of 22sec for a one map job. Is this too bad? What the supposed results might be?
>>>
>>> For nnbench, what's the supposed results? Below is my result.
>>> ===============>>>                               Date&    time: 2011-05-05 20:40:25,459
>>>
>>>                            Test Operation: rename
>>>                                Start time: 2011-05-05 20:40:03,820
>>>                               Maps to run: 1
>>>                            Reduces to run: 1
>>>                        Block Size (bytes): 1
>>>                            Bytes to write: 0
>>>                        Bytes per checksum: 1
>>>                           Number of files: 10000
>>>                        Replication factor: 1
>>>                Successful file operations: 10000
>>>
>>>            # maps that missed the barrier: 0
>>>                              # exceptions: 0
>>>
>>>                               TPS: Rename: 1763
>>>                Avg Exec time (ms): Rename: 0.5672
>>>                      Avg Lat (ms): Rename: 0.4844
>>> null
>>>
>>>                     RAW DATA: AL Total #1: 4844
>>>                     RAW DATA: AL Total #2: 0
>>>                  RAW DATA: TPS Total (ms): 5672
>>>           RAW DATA: Longest Map Time (ms): 5672.0
>>>                       RAW DATA: Late maps: 0
>>>                 RAW DATA: # of exceptions: 0
>>> ============================>>> One more question, when I set maps number to bigger, I get all zeros results:
>>> ============================>>> Test Operation: create_write
>>>                                Start time: 2011-05-03 23:22:39,239
>>>                               Maps to run: 160
>>>                            Reduces to run: 160
>>>                        Block Size (bytes): 1
>>>                            Bytes to write: 0
>>>                        Bytes per checksum: 1
>>>                           Number of files: 1
>>>                        Replication factor: 1
>>>                Successful file operations: 0
>>>
>>>            # maps that missed the barrier: 0
>>>                              # exceptions: 0
>>>
>>>                   TPS: Create/Write/Close: 0
>>> Avg exec time (ms): Create/Write/Close: 0.0
>>>                Avg Lat (ms): Create/Write: NaN
>>>                       Avg Lat (ms): Close: NaN
>>>
>>>                     RAW DATA: AL Total #1: 0
>>>                     RAW DATA: AL Total #2: 0
>>>                  RAW DATA: TPS Total (ms): 0
>>>           RAW DATA: Longest Map Time (ms): 0.0
>>>                       RAW DATA: Late maps: 0
>>>                 RAW DATA: # of exceptions: 0
>>> ====================>>>
>>> Can anyone point me to some documents?
>>> I really appreciate your help :)
>>>
>Ok, I understand.
Let me try to help you, because I'm a newie on the Hadoop ecosystem.
Tom White on its answer to this topic on the OReilly Answers's Site does
a introduction to this:

The following command writes 10 files of 1,000 MB each:

|%|  *|hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10|*

*|-fileSize 1000|*

At the end of the run, the results are written to the console and also
recorded in a local file (which is appended to, so you can rerun the
benchmark and not lose old results):

|%|  *|cat TestDFSIO_results.log|*
            Date&  time: Sun Apr 12 07:14:09 EDT 2009

        Number of files: 10

Total MBytes processed: 10000

      Throughput mb/sec: 7.796340865378244

Average IO rate mb/sec: 7.8862199783325195

  IO rate std deviation: 0.9101254683525547

     Test exec time sec: 163.387

The files are written under the |/benchmarks/TestDFSIO| directory by
default (this can be changed by setting the |test.build.data| system
property), in a directory called |io_data|.

To run a read benchmark, use the |-read| argument. Note that these files
must already exist (having been written by |TestDFSIO -write|):

|%|  *|hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -read -nrFiles 10|*

*|-fileSize 1000|*

Here are the results for a real run:
            Date&  time: Sun Apr 12 07:24:28 EDT 2009

        Number of files: 10

Total MBytes processed: 10000

      Throughput mb/sec: 80.25553361904304

Average IO rate mb/sec: 98.6801528930664

  IO rate std deviation: 36.63507598174921
     Test exec time sec: 47.624

When you’ve finished benchmarking, you can delete all the generated
files from HDFS using the |-clean| argument:

|%|  *|hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -clean|*
You can see that all results are written to the *|TestDFSIO_results.log.

So, you can begin to experiment with this.
  You can continue this reading on the Chapter 9 of the Hadoop: The
Definitive Guide 2nd Edition, on th
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB