Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - EC2 instance type recommendation ?


Copy link to this message
-
Re: EC2 instance type recommendation ?
Varun Sharma 2013-07-16, 19:08
We have both c1.xlarge and hi1.4xlarge clusters at Pinterest. We have used
the following guidelines:

1) hi1.4xlarge - small data sets, random read heavy and IOPs bound - very
expensive per GB but very cheap per IOP
2) c1.xlarge/m1.xlarge - larger data sets, medium to low read load - cheap
per GB but expensive per IOP

Folks have run m2.4xlarge instances for cassandra, we never benchmarked
those instances for HBase.

Note that it also boils down to how many reads you are doing. Are you doing
RAID0 or JBOD ? We always do JBOD. We found that the SATA disks are able to
consistently give good times if # of iops per disk < 100-150 - I think its
typical practice to not exceed that bound. That said, HBase could be doing
some extra IOPs for every read - if blooms are not effective, you could be
doing 3-4 iops per read request. So each disk will be able to give you a
meagre ~ 30 reads per second with good latency - so 120 requests per second
from 4 drives.

It is hard to say, without knowing how many reads per second you are
throwing, how many store files you have etc and how effective your blooms
are ?

Thanks
Varun
On Tue, Jul 16, 2013 at 11:52 AM, Amit Mor <[EMAIL PROTECTED]> wrote:

> Thanks !
> I was afraid to go into the SSD world with regards to VM, HDFS and HBase.
> Doesn't it break the whole concept of sequential r/w ?
> I've been using m1.xlarge and m2.4xlarge. Yet again , using the 68GB for a
> JVM is, mmm, horrific. Worst of, i found that each of the 4 m1.xlarge disk
> perform better than each of the 2 disks on m2.4xlarge
>  On Jul 16, 2013 9:32 PM, "Bryan Beaudreault" <[EMAIL PROTECTED]>
> wrote:
>
> > You're right that EC2 does not provide great instance types for HBase.
>  In
> > our research we found that most people seemed to be using c1.xlarge.
>  This
> > still is not great, because 7GB of RAM for DataNode + RegionServer + OS
> is
> > pitiful, but it works.  Of course that also makes it impossible to
> colocate
> > TaskTrackers on the RegionServer nodes.  We also tried m1.xlarge in the
> > beginning, and it's nice to have the memory but we quickly became CPU
> bound
> > and thus moved to c1.xlarge.  Of course you're not going to get good disk
> > performance on these ephemeral disks, but EBS is trouble in our
> experience.
> >
> > At HBaseCon, the pinterest engineers mentioned using hi1.4xlarge.  Those
> > are very expensive though and unfortunately just have 2 SSD disks.
>  Perhaps
> > they have benchmarks or can chime in as to whether they ventured to use
> EBS
> > to supplement the 2 disks or what.
> >
> >
> > On Tue, Jul 16, 2013 at 2:18 PM, Amit Mor <[EMAIL PROTECTED]>
> wrote:
> >
> > > Hello, I am curious to hear the recommendations people have here for
> > > running HBase on EC2 instances. I failed to find an instance that has a
> > > good ratio/balance between CPU, # of disks and jvm acceptable heap
> Memory
> > > to be used with (almost random) read intensive cluster. The major
> > > bottleneck I found was the disk read latency that could even reach
> srvtm
> > of
> > > a second ... Combined with everything else ;-)
> > >
> > > Thanks,
> > > Amit
> > >
> >
>