Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> recommended nodes


Copy link to this message
-
Re: recommended nodes
Hmm, I thought that RAID0 simply stripes across all disks. So if you got 4
disks - an HFile block for example could get striped across 4 disks. So to
read that block, you would need all 4 of them to seek so that you could
read all 4 stripes for that HFile block. This could make things as slow as
the slowest seeking disk for that random read. However, certainly, data
xfer rate would be much faster with RAID0 but since this is merely 64K for
a HFile block, I would have expected the seek latency to play a major role
and not really the data xfer latency.

However, your tests indeed show that RAID0 still outperforms JBOD on seeks.
Am I missing something ?

On Thu, Dec 20, 2012 at 1:26 PM, Jean-Marc Spaggiari <
[EMAIL PROTECTED]> wrote:

> Hi Varun,
>
> The hard drivers I used are now used on the hadoop/hbase cluster, but they
> was clear and formated for the tests I did. The computer where I run those
> tests was one of the region servers. It was re-installed to be very clear,
> and it's now running a datanode and a RS.
>
> Regarding RAID, I think you are confusing RAID0 and RAID1. It's RAID1 which
> need to access the 2 files each time. RAID0 is more like JBOD, but faster.
>
> JM
>
> 2012/12/20 Varun Sharma <[EMAIL PROTECTED]>
>
> > Hi Jean,
> >
> > Very interesting benchmark - how are these numbers arrived at ? Is this
> on
> > a real hbase cluster ? To me, it felt kind of counter intuitive that
> RAID0
> > beats JBOD on random seeks because with RAID0 all disks need to seek at
> the
> > same time and the performance should basically be as bad as the slowest
> > seeking disk.
> >
> > Varun
> >
> > On Wed, Dec 19, 2012 at 5:14 PM, Michael Segel <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Yeah,
> > > I couldn't argue against LVMs when talking with the system admins.
> > > In terms of speed its noise because the CPUs are pretty efficient and
> > > unless you have more than 1 drive per physical core, you will end up
> > > saturating your disk I/O.
> > >
> > > In terms of MapR, you want the raw disk. (But we're talking Apache)
> > >
> > >
> > > On Dec 19, 2012, at 4:59 PM, Jean-Marc Spaggiari <
> > [EMAIL PROTECTED]>
> > > wrote:
> > >
> > > > Finally, it took me a while to run those tests because it was way
> > > > longer than expected, but here are the results:
> > > >
> > > > http://www.spaggiari.org/bonnie.html
> > > >
> > > > LVM is not really slower than JBOD and not really taking more CPU. So
> > > > I will say, if you have to choose between the 2, take the one you
> > > > prefer. Personally, I prefer LVM because it's easy to configure.
> > > >
> > > > The big winner here is RAID0. It's WAY faster than anything else. But
> > > > it's using twice the space... Your choice.
> > > >
> > > > I did not get a chance to test with the Ubuntu tool because it's not
> > > > working with LVM drives.
> > > >
> > > > JM
> > > >
> > > > 2012/11/28, Michael Segel <[EMAIL PROTECTED]>:
> > > >> Ok, just a caveat.
> > > >>
> > > >> I am discussing MapR as part of a complete response. As Mohit posted
> > > MapR
> > > >> takes the raw device for their MapR File System.
> > > >> They do stripe on their own within what they call a volume.
> > > >>
> > > >> But going back to Apache...
> > > >> You can stripe drives, however I wouldn't recommend it. I don't
> think
> > > the
> > > >> performance gains would really matter.
> > > >> You're going to end up getting blocked first by disk i/o, then your
> > > >> controller card, then your network... assuming 10GBe.
> > > >>
> > > >> With only 2 disks on an 8 core system, you will hit disk i/o first
> and
> > > then
> > > >> you'll watch your CPU Wait I/O climb.
> > > >>
> > > >> HTH
> > > >>
> > > >> -Mike
> > > >>
> > > >> On Nov 28, 2012, at 7:28 PM, Jean-Marc Spaggiari <
> > > [EMAIL PROTECTED]>
> > > >> wrote:
> > > >>
> > > >>> Hi Mike,
> > > >>>
> > > >>> Why not using LVM with MapR? Since LVM is reading from 2 drives
> > almost
> > > >>> at the same time, it should be better than RAID0 or a single drive,