Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Acceptable CPU_WIO % ?


+
Jean-Marc Spaggiari 2013-02-08, 01:19
+
Kevin Odell 2013-02-08, 01:43
+
Jean-Marc Spaggiari 2013-02-08, 02:00
+
Kevin Odell 2013-02-08, 02:21
+
Jean-Marc Spaggiari 2013-02-08, 02:50
+
Kevin Odell 2013-02-08, 02:57
+
Jean-Marc Spaggiari 2013-02-08, 03:15
+
Azuryy Yu 2013-02-08, 03:23
+
Kevin Odell 2013-02-08, 13:56
Copy link to this message
-
Re: Acceptable CPU_WIO % ?
Hi Kevin,

I think it will take time before I get a chance to have 5 drives in
the same server, so I will see at that time to test RAID5.

I'm going to add one drive per server today or tomorrow to try to
improve that. What IOPs should I try to have? 100? Less? It will all
be SATA3 drives and I will configure all in RAID0.

It doesn't seems to me to be an issue to lose one node, since data
will be replicated everywhere else. I will "simply" have to replace
the failing disk and restart the node, no?

JM

2013/2/8, Kevin O'dell <[EMAIL PROTECTED]>:
> Azuryy,
>
>   The main reason to recommend against RAID is that it is slow and it adds
> redundancy that we already have in Hadoop.  RAID0 is another story as long
> as all of the drives are healthy and you don't mind losing the whole volume
> if you lose one drive.
>
> JM,
>
>   I would not even waste my time testing RAID5 or RAID6(unless it is just
> for educational purposes :) ).  200+ IOPs consistently on one SATA drive is
> pretty high, that would explain your high I/O wait time.  If your use case
> allows for you to lose the whole node, there is not a good reason for you
> to shy away from RAID0.  Please let us know how this plays out with your
> environment.
>
> On Thu, Feb 7, 2013 at 10:23 PM, Azuryy Yu <[EMAIL PROTECTED]> wrote:
>
>> JM,
>>
>> I don't have the context, but if you are using Hadoop/Hbase, so don't do
>> RAID on your disk.
>>
>>
>> On Fri, Feb 8, 2013 at 11:15 AM, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]> wrote:
>>
>> > Ok. I see. For my usecase I prefer to loose the data and have faster
>> > process. So I will go for RAID0 and keep the replication factor to
>> > 3... If at some point I have 5 disks in the node, I will most probably
>> > give a try to RAID5 and see the performances compared to the other
>> > RAID/JBOD options.
>> >
>> > Is there a "rule", like, 1 HD per core? Or we can't really simplify
>> > that
>> > much?
>> >
>> > So far I have that in the sar output:
>> > 21:35:03          tps      rtps      wtps   bread/s   bwrtn/s
>> > 21:45:03       218,85    215,97      2,88  45441,95    308,04
>> > 21:55:02       209,73    206,67      3,06  43985,28    378,32
>> > 22:05:04       215,03    211,71      3,33  44831,00    312,95
>> > Average :      214,54    211,45      3,09  44753,09    333,07
>> >
>> > But I'm not sure what it means. I will wait for tomorrow to get more
>> > results, but my job will be done over night, so I'm not sure the
>> > average will be accurate...
>> >
>> > JM
>> >
>> >
>> > 2013/2/7, Kevin O'dell <[EMAIL PROTECTED]>:
>> > > JM,
>> > >
>> > >   I think you misunderstood me.  I am not advocating any form of RAID
>> for
>> > > Hadoop.  It is true that we already have redundancy built in with
>> > > HDFS.
>> >  So
>> > > unless you were going to do something silly like sacrifice speed to
>> > > run
>> > > RAID1 or RAID5 and lower your replication to 2...just don't do it :)
>> > >  Anyway, yes you probably should have 3 - 4 drives per node if not
>> more.
>> > >  At that point then the you will really see the benefit of JBOD over
>> > RAID0
>> > >
>> > > Do you want to be able to lose a drive and keep the node up?  If yes,
>> > then
>> > > JBOD is for you.  Do you not care if you lose that node due to drive
>> > > failure? You just need speed, then RAID0 may be the correct choice.
>>  Sar
>> > > will take some time to populate.  Give it about 24 hours and you
>> > > should
>> > be
>> > > able to glean some interesting information.
>> > >
>> > > On Thu, Feb 7, 2013 at 9:50 PM, Jean-Marc Spaggiari
>> > > <[EMAIL PROTECTED]
>> > >> wrote:
>> > >
>> > >> Ok. I see with RAID0 might be better for me compare to JBOD. Also,
>> > >> why
>> > >> do we want to use RAID1 or RAID5? We already have the redundancy
>> > >> done
>> > >> by hadoop, is it not going to add another non-required level of
>> > >> redundancy?
>> > >>
>> > >> Should I already think to have 3 or even 4 drives in each node?
>> > >>
>> > >> I tried sar -A and it's only giving me 2 lines.
+
Kevin Odell 2013-02-08, 16:37
+
Jean-Marc Spaggiari 2013-02-09, 16:13
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB