|
|
-
Hardware specs calculation for io
Sandeep Reddy P 2012-06-13, 14:36
Hi, I need to know difference between two hardware configurations below for 24TB of data. (slave machines only for hadoop,hive and pig)
TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine)
TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine)
suppose we choose 4 type A machines for 24tb of data and 2 type b machines for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is same for 4Type A and 2 Type B machines.
I need which type of machines will give me best results in terms of performance. -- Thanks, sandeep
-
Re: Hardware specs calculation for io
Matt Davies 2012-06-13, 15:44
Sandeep,
I think one critical piece missing is whether or not you are counting the 24 TB as raw or as replicated. In a standard environment with a rep factor of 3 you really need 72 TB disk space which triples your hardware requirements.
Regardless, my experience has been to favor A and scale out vs a scale up. A simple metric might be a 2 quad core would equate to 8+ worker threads and B would be 16+. So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB for DN you have 28/8 (~3.5) for each worker. The same overhead on B would be 44/16 (2.75 GB ) per worker. This is but one metric.
The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/ core so that would definitely favor A.
Perhaps the biggest factor of all is expected workload. Will you be computationally bound or IO bound? I.e. all things being equal hardware-wise will you be spending most of your time crunching or reading data?
A few thoughts.
-Matt
On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P < [EMAIL PROTECTED]> wrote:
> Hi, > I need to know difference between two hardware configurations below for > 24TB of data. (slave machines only for hadoop,hive and pig) > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > suppose we choose 4 type A machines for 24tb of data and 2 type b machines > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is > same for 4Type A and 2 Type B machines. > > I need which type of machines will give me best results in terms of > performance. > > > -- > Thanks, > sandeep >
-
Re: Hardware specs calculation for io
Sandeep Reddy P 2012-06-13, 17:23
Thanks for the reply Matt, We have 6TB of raw data. We are io bound. On Wed, Jun 13, 2012 at 11:44 AM, Matt Davies <[EMAIL PROTECTED]> wrote:
> Sandeep, > > I think one critical piece missing is whether or not you are counting the > 24 TB as raw or as replicated. In a standard environment with a rep factor > of 3 you really need 72 TB disk space which triples your hardware > requirements. > > Regardless, my experience has been to favor A and scale out vs a scale up. > A simple metric might be a 2 quad core would equate to 8+ worker threads > and B would be 16+. So, if you take out 1-2 GB for OS, 1 GB JT, and 1 GB > for DN you have 28/8 (~3.5) for each worker. The same overhead on B would > be 44/16 (2.75 GB ) per worker. This is but one metric. > > The other is amount of HD per core. I've heard anywhere from .8 to 1.5 TB/ > core so that would definitely favor A. > > Perhaps the biggest factor of all is expected workload. Will you be > computationally bound or IO bound? I.e. all things being equal > hardware-wise will you be spending most of your time crunching or reading > data? > > A few thoughts. > > -Matt > > On Wed, Jun 13, 2012 at 7:36 AM, Sandeep Reddy P < > [EMAIL PROTECTED]> wrote: > > > Hi, > > I need to know difference between two hardware configurations below for > > 24TB of data. (slave machines only for hadoop,hive and pig) > > > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > > > suppose we choose 4 type A machines for 24tb of data and 2 type b > machines > > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost > is > > same for 4Type A and 2 Type B machines. > > > > I need which type of machines will give me best results in terms of > > performance. > > > > > > -- > > Thanks, > > sandeep > > >
-- Thanks, sandeep
-
Re: Hardware specs calculation for io
Michael Segel 2012-06-13, 18:52
You will want something in between...
8 cores means 8 spindles.
16 cores means 16 spindles.
You may want to up the memory, especially if you're running or thinking about running HBase.
If you go beyond 4 spindles, you will saturate your 1GBe link. If you think about Type B, you will need 10GBe. On Jun 13, 2012, at 9:36 AM, Sandeep Reddy P wrote:
> Hi, > I need to know difference between two hardware configurations below for > 24TB of data. (slave machines only for hadoop,hive and pig) > > TYPE A: 2 quad core, 32 GB memory, 6 x 1TB drives(6TB / machine) > > TYPE B: 4 quad core, 48 GB memory, 12 x 1TB drives (12TB / machine) > > suppose we choose 4 type A machines for 24tb of data and 2 type b machines > for 24 tb data. Assuming disk io speed is constant (7200 RPM sata), cost is > same for 4Type A and 2 Type B machines. > > I need which type of machines will give me best results in terms of > performance. > > > -- > Thanks, > sandeep
|
|