Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Acceptable CPU_WIO % ?


Copy link to this message
-
Re: Acceptable CPU_WIO % ?
Perfect, thanks Kevin.

Looking at SAR this morning, I can see that I'm sometime reaching
300tps, and spiked at 80% WIO... That will cost me 5 new additional
hard drives :(

I’m not sure I will have time to install them all today, but as soon
as it’s done I will give you some news.

JM
2013/2/8, Kevin O'dell <[EMAIL PROTECTED]>:
> JM,
>
>   Basically, you will have to replace failed disk and rebuild RAID0 since
> the other half of the data is worthless.  There is not a real recommended
> value, but anything under 150 - 200 would make me more comfortable.
>
> On Fri, Feb 8, 2013 at 10:43 AM, Jean-Marc Spaggiari <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Kevin,
>>
>> I think it will take time before I get a chance to have 5 drives in
>> the same server, so I will see at that time to test RAID5.
>>
>> I'm going to add one drive per server today or tomorrow to try to
>> improve that. What IOPs should I try to have? 100? Less? It will all
>> be SATA3 drives and I will configure all in RAID0.
>>
>> It doesn't seems to me to be an issue to lose one node, since data
>> will be replicated everywhere else. I will "simply" have to replace
>> the failing disk and restart the node, no?
>>
>> JM
>>
>> 2013/2/8, Kevin O'dell <[EMAIL PROTECTED]>:
>> > Azuryy,
>> >
>> >   The main reason to recommend against RAID is that it is slow and it
>> adds
>> > redundancy that we already have in Hadoop.  RAID0 is another story as
>> long
>> > as all of the drives are healthy and you don't mind losing the whole
>> volume
>> > if you lose one drive.
>> >
>> > JM,
>> >
>> >   I would not even waste my time testing RAID5 or RAID6(unless it is
>> > just
>> > for educational purposes :) ).  200+ IOPs consistently on one SATA
>> > drive
>> is
>> > pretty high, that would explain your high I/O wait time.  If your use
>> case
>> > allows for you to lose the whole node, there is not a good reason for
>> > you
>> > to shy away from RAID0.  Please let us know how this plays out with
>> > your
>> > environment.
>> >
>> > On Thu, Feb 7, 2013 at 10:23 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>> >
>> >> JM,
>> >>
>> >> I don't have the context, but if you are using Hadoop/Hbase, so don't
>> >> do
>> >> RAID on your disk.
>> >>
>> >>
>> >> On Fri, Feb 8, 2013 at 11:15 AM, Jean-Marc Spaggiari <
>> >> [EMAIL PROTECTED]> wrote:
>> >>
>> >> > Ok. I see. For my usecase I prefer to loose the data and have faster
>> >> > process. So I will go for RAID0 and keep the replication factor to
>> >> > 3... If at some point I have 5 disks in the node, I will most
>> >> > probably
>> >> > give a try to RAID5 and see the performances compared to the other
>> >> > RAID/JBOD options.
>> >> >
>> >> > Is there a "rule", like, 1 HD per core? Or we can't really simplify
>> >> > that
>> >> > much?
>> >> >
>> >> > So far I have that in the sar output:
>> >> > 21:35:03          tps      rtps      wtps   bread/s   bwrtn/s
>> >> > 21:45:03       218,85    215,97      2,88  45441,95    308,04
>> >> > 21:55:02       209,73    206,67      3,06  43985,28    378,32
>> >> > 22:05:04       215,03    211,71      3,33  44831,00    312,95
>> >> > Average :      214,54    211,45      3,09  44753,09    333,07
>> >> >
>> >> > But I'm not sure what it means. I will wait for tomorrow to get more
>> >> > results, but my job will be done over night, so I'm not sure the
>> >> > average will be accurate...
>> >> >
>> >> > JM
>> >> >
>> >> >
>> >> > 2013/2/7, Kevin O'dell <[EMAIL PROTECTED]>:
>> >> > > JM,
>> >> > >
>> >> > >   I think you misunderstood me.  I am not advocating any form of
>> RAID
>> >> for
>> >> > > Hadoop.  It is true that we already have redundancy built in with
>> >> > > HDFS.
>> >> >  So
>> >> > > unless you were going to do something silly like sacrifice speed
>> >> > > to
>> >> > > run
>> >> > > RAID1 or RAID5 and lower your replication to 2...just don't do it
>> >> > > :)
>> >> > >  Anyway, yes you probably should have 3 - 4 drives per node if not
>> >> more.
>> >> > >  At that point then the you will really see the benefit of JBOD
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB