Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EC2 Elastic MapReduce HBase install recommendations


Copy link to this message
-
Re: EC2 Elastic MapReduce HBase install recommendations
Principally I chose to use Amazon, because they are supposedly high
performance, and what more important is: HBase is already set up if I chose
it as an EMR Workflow. I wanted to save up the time setting up the cluster
manually on EC2 instances.

Are you saying I will reach higher performance when I set up the HBase on
the cluster manually, instead of the default Amazon HBase distribution? Or
is it worth to tune the Amazon distribution with a bootstrap action? How
long does it take, to set up the cluster with HDFS manually?

I will also try larger instance types.
On Thu, May 9, 2013 at 6:47 AM, Michel Segel <[EMAIL PROTECTED]>wrote:

> With respect to EMR, you can run HBase fairly easily.
> You can't run MapR w HBase on EMR stick w Amazon's release.
>
> And you can run it but you will want to know your tuning parameters up
> front when you instantiate it.
>
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On May 8, 2013, at 9:04 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>
> > M7 is not Apache HBase, or any HBase. It is a proprietary NoSQL datastore
> > with (I gather) an Apache HBase compatible Java API.
> >
> > As for running HBase on EC2, we recently discussed some particulars, see
> > the latter part of this thread: http://search-hadoop.com/m/rI1HpK90guwhere
> > I hijack it. I wouldn't recommend launching HBase as part of an EMR flow
> > unless you want to use it only for temporary random access storage, and
> in
> > which case use m2.2xlarge/m2.4xlarge instance types. Otherwise, set up a
> > dedicated HBase backed storage service on high I/O instance types. The
> > fundamental issue is IO performance on the EC2 platform is fair to poor.
> >
> > I have also noticed a large difference in baseline block device latency
> if
> > using an old Amazon Linux AMI (< 2013) or the latest AMIs from this year.
> > Use the new ones, they cut the latency long tail in half. There were some
> > significant kernel level improvements I gather.
> >
> >
> > On Wed, May 8, 2013 at 10:42 AM, Marcos Luis Ortiz Valmaseda <
> > [EMAIL PROTECTED]> wrote:
> >
> >> I think that you when you are talking about RMap, you are referring to
> >> MapR´s distribution.
> >> I think that MapR´s team released a very good version of its Hadoop
> >> distribution focused on HBase called M7. You can see its overview here:
> >> http://www.mapr.com/products/mapr-editions/m7-edition
> >>
> >> But this release was under beta testing, and I see that it´s not
> included
> >> in the Amazon Marketplace yet:
> >>
> >>
> https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5
> >>
> >>
> >>
> >>
> >> 2013/5/7 Pal Konyves <[EMAIL PROTECTED]>
> >>
> >>> Hi,
> >>>
> >>> Has anyone got some recommendations about running HBase on EC2? I am
> >>> testing it, and so far I am very disappointed with it. I did not change
> >>> anything about the default 'Amazon distribution' installation. It has
> one
> >>> MasterNode and two slave nodes, and write performance is around 2500
> >> small
> >>> rows per sec at most, but I expected it to be way  better. Oh, and this
> >> is
> >>> with batch put operations with autocommit turned off, where each batch
> >>> containes about 500-1000 rows... When I do it with autocommit, it does
> >> not
> >>> even reach the 1000 rows per sec.
> >>>
> >>> Every nodes were m1.Large ones.
> >>>
> >>> Any experiences, suggestions? Is it worth to try the RMap distribution
> >>> instead of the amazon one?
> >>>
> >>> Thanks,
> >>> Pal
> >>
> >>
> >>
> >> --
> >> Marcos Ortiz Valmaseda
> >> Product Manager at PDVSA
> >> http://about.me/marcosortiz
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>