


Hardware specs calculation for io
Hi, I need to know difference between two hardware configurations below for 24TB of data. (slave machines only for hadoop,hive and pig)
TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
suppose we choose 4 type A machines for 24tb of data and 2 type b machines for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is same for 4Type A and 2 Type B machines.
I need which type of machines will give me best results in terms of performance.  Thanks, sandeep

Re: Hardware specs calculation for io
Sandeep,
I think one critical piece missing is whether or not you are counting the 24 TB as raw or as replicated. In a standard environment with a rep factor of 3 you really need 72 TB disk space which triples your hardware requirements.
Regardless, my experience has been to favor A and scale out vs a scale up. A simple metric might be a 2 quad core would equate to 8+ worker threads and B would be 16+. So, if you take out 12 GB for OS, 1 GB JT, and 1 GB for DN you have 28/8 (~3.5) for each worker. The same overhead on B would be 44/16 (2.75 GB ) per worker. This is but one metric.
The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/ core so that would definitely favor A.
Perhaps the biggest factor of all is expected workload. Will you be computationally bound or IO bound? I.e. all things being equal hardwarewise will you be spending most of your time crunching or reading data?
A few thoughts.
Matt
On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P < [EMAIL PROTECTED]> wrote:
> Hi, > I need to know difference between two hardware configurations below for > 24TB of data. (slave machines only for hadoop,hive and pig) > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > suppose we choose 4 type A machines for 24tb of data and 2 type b machines > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is > same for 4Type A and 2 Type B machines. > > I need which type of machines will give me best results in terms of > performance. > > >  > Thanks, > sandeep >

Re: Hardware specs calculation for io
Thanks for the reply Matt, We have 6TB of raw data. We are io bound. On Wed, Jun 13, 2012 at 11:44 AM, Matt Davies <[EMAIL PROTECTED]> wrote:
> Sandeep, > > I think one critical piece missing is whether or not you are counting the > 24 TB as raw or as replicated. In a standard environment with a rep factor > of 3 you really need 72 TB disk space which triples your hardware > requirements. > > Regardless, my experience has been to favor A and scale out vs a scale up. > A simple metric might be a 2 quad core would equate to 8+ worker threads > and B would be 16+. So, if you take out 12 GB for OS, 1 GB JT, and 1 GB > for DN you have 28/8 (~3.5) for each worker. The same overhead on B would > be 44/16 (2.75 GB ) per worker. This is but one metric. > > The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/ > core so that would definitely favor A. > > Perhaps the biggest factor of all is expected workload. Will you be > computationally bound or IO bound? I.e. all things being equal > hardwarewise will you be spending most of your time crunching or reading > data? > > A few thoughts. > > Matt > > On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P < > [EMAIL PROTECTED]> wrote: > > > Hi, > > I need to know difference between two hardware configurations below for > > 24TB of data. (slave machines only for hadoop,hive and pig) > > > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > > > suppose we choose 4 type A machines for 24tb of data and 2 type b > machines > > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost > is > > same for 4Type A and 2 Type B machines. > > > > I need which type of machines will give me best results in terms of > > performance. > > > > > >  > > Thanks, > > sandeep > > >
 Thanks, sandeep

Re: Hardware specs calculation for io
You will want something in between...
8 cores means 8 spindles.
16 cores means 16 spindles.
You may want to up the memory, especially if you're running or thinking about running HBase.
If you go beyond 4 spindles, you will saturate your 1GBe link. If you think about Type B, you will need 10GBe. On Jun 13, 2012, at 9:36 AM, Sandeep Reddy P wrote:
> Hi, > I need to know difference between two hardware configurations below for > 24TB of data. (slave machines only for hadoop,hive and pig) > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > suppose we choose 4 type A machines for 24tb of data and 2 type b machines > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is > same for 4Type A and 2 Type B machines. > > I need which type of machines will give me best results in terms of > performance. > > >  > Thanks, > sandeep

