Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> recommendation on HDDs


Copy link to this message
-
Re: recommendation on HDDs
On Fri, Feb 11, 2011 at 7:14 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Bandwidth is definitely better with more active spindles.  I would recommend
> several larger disks.  The cost is very nearly the same.
>
> On Fri, Feb 11, 2011 at 3:52 PM, Shrinivas Joshi <[EMAIL PROTECTED]>wrote:
>
>> Thanks for your inputs, Michael.  We have 6 open SATA ports on the
>> motherboards. That is the reason why we are thinking of 4 to 5 data disks
>> and 1 OS disk.
>> Are you suggesting use of one 2TB disk instead of four 500GB disks lets
>> say?
>> I thought that the HDFS utilization/throughput increases with the # of
>> disks
>> per node (assuming that the total usable IO bandwidth increases
>> proportionally).
>>
>> -Shrinivas
>>
>> On Thu, Feb 10, 2011 at 4:25 PM, Michael Segel <[EMAIL PROTECTED]
>> >wrote:
>>
>> >
>> > Shrinivas,
>> >
>> > Assuming you're in the US, I'd recommend the following:
>> >
>> > Go with 2TB 7200 SATA hard drives.
>> > (Not sure what type of hardware you have)
>> >
>> > What  we've found is that in the data nodes, there's an optimal
>> > configuration that balances price versus performance.
>> >
>> > While your chasis may hold 8 drives, how many open SATA ports are on the
>> > motherboard? Since you're using JBOD, you don't want the additional
>> expense
>> > of having to purchase a separate controller card for the additional
>> drives.
>> >
>> > I'm running Seagate drives at home and I haven't had any problems for
>> > years.
>> > When you look at your drive, you need to know total storage, speed
>> (rpms),
>> > and cache size.
>> > Looking at Microcenter's pricing... 2TB 3.0GB SATA Hitachi was $110.00 A
>> > 1TB Seagate was 70.00
>> > A 250GB SATA drive was $45.00
>> >
>> > So 2TB = 110, 140, 180 (respectively)
>> >
>> > So you get a better deal on 2TB.
>> >
>> > So if you go out and get more drives but of lower density, you'll end up
>> > spending more money and use more energy, but I doubt you'll see a real
>> > performance difference.
>> >
>> > The other thing is that if you want to add more disk, you have room to
>> > grow. (Just add more disk and restart the node, right?)
>> > If all of your disk slots are filled, you're SOL. You have to take out
>> the
>> > box, replace all of the drives, then add to cluster as 'new' node.
>> >
>> > Just my $0.02 cents.
>> >
>> > HTH
>> >
>> > -Mike
>> >
>> > > Date: Thu, 10 Feb 2011 15:47:16 -0600
>> > > Subject: Re: recommendation on HDDs
>> > > From: [EMAIL PROTECTED]
>> > > To: [EMAIL PROTECTED]
>> > >
>> > > Hi Ted, Chris,
>> > >
>> > > Much appreciate your quick reply. The reason why we are looking for
>> > smaller
>> > > capacity drives is because we are not anticipating a huge growth in
>> data
>> > > footprint and also read somewhere that larger the capacity of the
>> drive,
>> > > bigger the number of platters in them and that could affect drive
>> > > performance. But looks like you can get 1TB drives with only 2
>> platters.
>> > > Large capacity drives should be OK for us as long as they perform
>> equally
>> > > well.
>> > >
>> > > Also, the systems that we have can host up to 8 SATA drives in them. In
>> > that
>> > > case, would  backplanes offer additional advantages?
>> > >
>> > > Any suggestions on 5400 vs. 7200 vs. 10000 RPM disks?  I guess 10K rpm
>> > disks
>> > > would be overkill comparing their perf/cost advantage?
>> > >
>> > > Thanks for your inputs.
>> > >
>> > > -Shrinivas
>> > >
>> > > On Thu, Feb 10, 2011 at 2:48 PM, Chris Collins <
>> > [EMAIL PROTECTED]>wrote:
>> > >
>> > > > Of late we have had serious issues with seagate drives in our hadoop
>> > > > cluster.  These were purchased over several purchasing cycles and
>> > pretty
>> > > > sure it wasnt just a single "bad batch".   Because of this we
>> switched
>> > to
>> > > > buying 2TB hitachi drives which seem to of been considerably more
>> > reliable.
>> > > >
>> > > > Best
>> > > >
>> > > > C
>> > > > On Feb 10, 2011, at 12:43 PM, Ted Dunning wrote:

You also do not need a dedicated OS disk. I typically slice to
partitions of some of the disks and do a software mirror there. this
gives you redundancy without having to sacrifice one or two disk slots
with smaller disks.