|
ArunKumar
2011-12-09, 07:24
alo alt
2011-12-09, 07:53
ArunKumar
2011-12-09, 09:08
alo alt
2011-12-09, 09:15
ArunKumar
2011-12-09, 09:58
alo alt
2011-12-09, 10:22
Brad Sarsfield
2011-12-09, 16:23
ArunKumar
2011-12-10, 03:09
|
-
Choosing IO intensive and CPU intensive workloadsArunKumar 2011-12-09, 07:24
Hi guys !
I want to see the behavior of a single node of Hadoop cluster when IO intensive / CPU intensive workload and mix of both is submitted to the single node alone. These workloads must stress the nodes. I see that TestDFSIO benchmark is good for IO intensive workload. 1> Which benchmarks do i need to use for this ? 2> What amount of input data will be fair enough for seeing the behavior under these workloads for each type of boxes if i have boxes with :- B1: 4 GB RAM, Dual core ,150-250 GB DISK , B2 : 1GB RAM, 50-80 GB Disk. Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Choosing-IO-intensive-and-CPU-intensive-workloads-tp3572282p3572282.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
-
Re: Choosing IO intensive and CPU intensive workloadsalo alt 2011-12-09, 07:53
Hi Arun,
Micheal has write up a good tutorial about, including stress test and IO. http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/ - Alex On Fri, Dec 9, 2011 at 8:24 AM, ArunKumar <[EMAIL PROTECTED]> wrote: > Hi guys ! > > I want to see the behavior of a single node of Hadoop cluster when IO > intensive / CPU intensive workload and mix of both is submitted to the > single node alone. > These workloads must stress the nodes. > I see that TestDFSIO benchmark is good for IO intensive workload. > 1> Which benchmarks do i need to use for this ? > 2> What amount of input data will be fair enough for seeing the behavior > under these workloads for each type of boxes if i have boxes with :- > B1: 4 GB RAM, Dual core ,150-250 GB DISK , > B2 : 1GB RAM, 50-80 GB Disk. > > Arun > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Choosing-IO-intensive-and-CPU-intensive-workloads-tp3572282p3572282.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
-
Re: Choosing IO intensive and CPU intensive workloadsArunKumar 2011-12-09, 09:08
Alex,
Thanks for the link. I have boxes of say 30 - 50 of free space. Obviously i can't run Terasort . What reasonable input size do i need to take to see the behaviour when Terasort and TestDFSIO are run? Is there any benchmark for mixed workload ? Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3572416.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
-
Re: Choosing IO intensive and CPU intensive workloadsalo alt 2011-12-09, 09:15
Hmm, the PI or Wordcount workload could be usefull. Sorry, I have such
topics always as links for me: http://developer.yahoo.com/hadoop/tutorial/module3.html#running => wordcount I think per default are some examples included, like Pi: cd /usr/lib/hadoop-0.20/ hadoop jar hadoop-examples.jar pi 10 10000000 - Alex On Fri, Dec 9, 2011 at 10:08 AM, ArunKumar <[EMAIL PROTECTED]> wrote: > Alex, > > Thanks for the link. > I have boxes of say 30 - 50 of free space. Obviously i can't run Terasort . > What reasonable input size do i need to take to see the behaviour when > Terasort and TestDFSIO are run? > Is there any benchmark for mixed workload ? > > Arun > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3572416.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
-
Re: Choosing IO intensive and CPU intensive workloadsArunKumar 2011-12-09, 09:58
Alex,
To see the behavior of a single node under compute intensive benchmark which params other than finish time of the jobs are available or which can be considered ? Arun -- View this message in context: http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3572519.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
-
Re: Choosing IO intensive and CPU intensive workloadsalo alt 2011-12-09, 10:22
Hi Arun,
In hadoop-*test*.jar we have a lot testcases, could any of them match yours? #> cd /usr/lib/hadoop-0.20/ && hadoop jar hadoop-*test*.jar - Alex On Fri, Dec 9, 2011 at 10:58 AM, ArunKumar <[EMAIL PROTECTED]> wrote: > Alex, > > To see the behavior of a single node under compute intensive benchmark > which > params other than finish time of the jobs are available or which can be > considered ? > > Arun > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3572519.html > Sent from the Hadoop lucene-users mailing list archive at Nabble.com. > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
-
RE: Choosing IO intensive and CPU intensive workloadsBrad Sarsfield 2011-12-09, 16:23
Hi Arun
TestDFS IO is good; I like "Teragen/Terasort" as a IO benchmark to help understand the IO capabilities of your hardware and network (running at GB scale if you want to look at a single box ). There are a number of dials you can turn in your experiment that will reveal different things about your setup. The other thing that you'll want to rationalize is the total number of tasks; a slight oversubscription of map/redtasks to cores, depending on your workload, may be a good place to start optimization. Knowing what each of your hardware configurations are capable of (B1 and B2 in your case) will allow you to help set expectations of what the box is physically able to do. How? Generate: Hadoop jar hadoop-examples-xxx-.jar teragen -conf terasort.xml 100000000 10GBsort-input Sort: hadoop jar hadoop-examples-xxx-.jar terasort -conf terasort.xml 10GBsort-input 10GBsort-output Then in terasort.xml you can play with many values; Remember to only turn one at a time. 10GB should work in your case <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>mapred.map.tasks</name> <value>25</value> </property> <property> <name>mapred.reduce.tasks</name> <value>10</value> </property> <property> <name>dfs.block.size</name> <value>134217728</value> <!-- 536870912 ==512, 268435456 == 256, 134217728==128 --> </property> .... etc -----Original Message----- From: alo alt [mailto:[EMAIL PROTECTED]] Sent: Friday, December 09, 2011 2:23 AM To: [EMAIL PROTECTED] Subject: Re: Choosing IO intensive and CPU intensive workloads Hi Arun, In hadoop-*test*.jar we have a lot testcases, could any of them match yours? #> cd /usr/lib/hadoop-0.20/ && hadoop jar hadoop-*test*.jar - Alex On Fri, Dec 9, 2011 at 10:58 AM, ArunKumar <[EMAIL PROTECTED]> wrote: > Alex, > > To see the behavior of a single node under compute intensive benchmark > which params other than finish time of the jobs are available or which > can be considered ? > > Arun > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workl > oads-tp3572282p3572519.html Sent from the Hadoop lucene-users mailing > list archive at Nabble.com. > -- Alexander Lorenz http://mapredit.blogspot.com *P **Think of the environment: please don't print this email unless you really need to.*
-
Re: Choosing IO intensive and CPU intensive workloadsArunKumar 2011-12-10, 03:09
Brad,
Thanks for clear explanation. For seeing compute capabilities of my Box's Hardware which benchmarks can be used and what values should i be interested apart from tasks run time? Arun On Fri, Dec 9, 2011 at 9:54 PM, Brad Sarsfield [via Lucene] < ml-node+[EMAIL PROTECTED]> wrote: > Hi Arun > > TestDFS IO is good; I like "Teragen/Terasort" as a IO benchmark to help > understand the IO capabilities of your hardware and network (running at GB > scale if you want to look at a single box ). There are a number of dials > you can turn in your experiment that will reveal different things about > your setup. > > The other thing that you'll want to rationalize is the total number of > tasks; a slight oversubscription of map/redtasks to cores, depending on > your workload, may be a good place to start optimization. Knowing what > each of your hardware configurations are capable of (B1 and B2 in your > case) will allow you to help set expectations of what the box is physically > able to do. > > How? > Generate: Hadoop jar hadoop-examples-xxx-.jar teragen -conf terasort.xml > 100000000 10GBsort-input > Sort: hadoop jar hadoop-examples-xxx-.jar terasort -conf terasort.xml > 10GBsort-input 10GBsort-output > > Then in terasort.xml you can play with many values; Remember to only turn > one at a time. 10GB should work in your case > <configuration> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > <property> > <name>mapred.map.tasks</name> > <value>25</value> > </property> > <property> > <name>mapred.reduce.tasks</name> > <value>10</value> > </property> > <property> > <name>dfs.block.size</name> > <value>134217728</value> <!-- 536870912 ==512, 268435456 == 256, > 134217728==128 --> > </property> > .... etc > > -----Original Message----- > From: alo alt [mailto:[hidden email]<http://user/SendEmail.jtp?type=node&node=3573343&i=0>] > > Sent: Friday, December 09, 2011 2:23 AM > To: [hidden email] <http://user/SendEmail.jtp?type=node&node=3573343&i=1> > Subject: Re: Choosing IO intensive and CPU intensive workloads > > Hi Arun, > > In hadoop-*test*.jar we have a lot testcases, could any of them match > yours? > #> cd /usr/lib/hadoop-0.20/ && hadoop jar hadoop-*test*.jar > > - Alex > > > > On Fri, Dec 9, 2011 at 10:58 AM, ArunKumar <[hidden email]<http://user/SendEmail.jtp?type=node&node=3573343&i=2>> > wrote: > > > Alex, > > > > To see the behavior of a single node under compute intensive benchmark > > which params other than finish time of the jobs are available or which > > can be considered ? > > > > Arun > > > > > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workl > > oads-tp3572282p3572519.html Sent from the Hadoop lucene-users mailing > > list archive at Nabble.com. > > > > > > -- > Alexander Lorenz > http://mapredit.blogspot.com > > *P **Think of the environment: please don't print this email unless you > really need to.* > > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3573343.html > To unsubscribe from Choosing IO and CPU intensive workloads, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3572282&code=YXJ1bms3ODZAZ21haWwuY29tfDM1NzIyODJ8NzA5NTc4MTY3> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Choosing-IO-and-CPU-intensive-workloads-tp3572282p3574705.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. |