Brad,
Thanks for clear explanation.
For seeing compute capabilities of my Box's Hardware which benchmarks can
be used and what values should i be interested apart from tasks run time?
Arun
On Fri, Dec 9, 2011 at 9:54 PM, Brad Sarsfield [via Lucene] <
ml-node+[EMAIL PROTECTED]> wrote:
> Hi Arun
>
> TestDFS IO is good; I like "Teragen/Terasort" as a IO benchmark to help
> understand the IO capabilities of your hardware and network (running at GB
> scale if you want to look at a single box ). There are a number of dials
> you can turn in your experiment that will reveal different things about
> your setup.
>
> The other thing that you'll want to rationalize is the total number of
> tasks; a slight oversubscription of map/redtasks to cores, depending on
> your workload, may be a good place to start optimization. Knowing what
> each of your hardware configurations are capable of (B1 and B2 in your
> case) will allow you to help set expectations of what the box is physically
> able to do.
>
> How?
> Generate: Hadoop jar hadoop-examples-xxx-.jar teragen -conf terasort.xml
> 100000000 10GBsort-input
> Sort: hadoop jar hadoop-examples-xxx-.jar terasort -conf terasort.xml
> 10GBsort-input 10GBsort-output
>
> Then in terasort.xml you can play with many values; Remember to only turn
> one at a time. 10GB should work in your case
> <configuration>
> <property>
> <name>dfs.replication</name>
> <value>1</value>
> </property>
> <property>
> <name>mapred.map.tasks</name>
> <value>25</value>
> </property>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>10</value>
> </property>
> <property>
> <name>dfs.block.size</name>
> <value>134217728</value> <!-- 536870912 ==512, 268435456 == 256,
> 134217728==128 -->
> </property>
> .... etc
>
> -----Original Message-----
> From: alo alt [mailto:[hidden email]<
http://user/SendEmail.jtp?type=node&node=3573343&i=0>]>
> Sent: Friday, December 09, 2011 2:23 AM
> To: [hidden email] <
http://user/SendEmail.jtp?type=node&node=3573343&i=1>> Subject: Re: Choosing IO intensive and CPU intensive workloads
>
> Hi Arun,
>
> In hadoop-*test*.jar we have a lot testcases, could any of them match
> yours?
> #> cd /usr/lib/hadoop-0.20/ && hadoop jar hadoop-*test*.jar
>
> - Alex
>
>
>
> On Fri, Dec 9, 2011 at 10:58 AM, ArunKumar <[hidden email]<
http://user/SendEmail.jtp?type=node&node=3573343&i=2>>> wrote:
>
> > Alex,
> >
> > To see the behavior of a single node under compute intensive benchmark
> > which params other than finish time of the jobs are available or which
> > can be considered ?
> >
> > Arun
> >
> >
> >
> > --
> > View this message in context:
> >
http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workl> > oads-tp3572282p3572519.html Sent from the Hadoop lucene-users mailing
> > list archive at Nabble.com.
> >
>
>
>
> --
> Alexander Lorenz
>
http://mapredit.blogspot.com>
> *P **Think of the environment: please don't print this email unless you
> really need to.*
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
>
http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3573343.html> To unsubscribe from Choosing IO and CPU intensive workloads, click here<
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3572282&code=YXJ1bms3ODZAZ21haWwuY29tfDM1NzIyODJ8NzA5NTc4MTY3>> .
> NAML<
http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>>
--
View this message in context:
http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3574705.htmlSent from the Hadoop lucene-users mailing list archive at Nabble.com.