Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EC2 Elastic MapReduce HBase install recommendations


Copy link to this message
-
Re: EC2 Elastic MapReduce HBase install recommendations
What I am saying is that by default, you get two mappers per node.
x4large can run HBase w more mapred slots, so you will want to tune the defaults based on machine size. Not just mapred, but also HBase stuff too. You need to do this on startup of EMR cluster though...

Sent from a remote device. Please excuse any typos...

Mike Segel

On May 9, 2013, at 2:39 AM, Pal Konyves <[EMAIL PROTECTED]> wrote:

> Principally I chose to use Amazon, because they are supposedly high
> performance, and what more important is: HBase is already set up if I chose
> it as an EMR Workflow. I wanted to save up the time setting up the cluster
> manually on EC2 instances.
>
> Are you saying I will reach higher performance when I set up the HBase on
> the cluster manually, instead of the default Amazon HBase distribution? Or
> is it worth to tune the Amazon distribution with a bootstrap action? How
> long does it take, to set up the cluster with HDFS manually?
>
> I will also try larger instance types.
>
>
> On Thu, May 9, 2013 at 6:47 AM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> With respect to EMR, you can run HBase fairly easily.
>> You can't run MapR w HBase on EMR stick w Amazon's release.
>>
>> And you can run it but you will want to know your tuning parameters up
>> front when you instantiate it.
>>
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 8, 2013, at 9:04 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>>
>>> M7 is not Apache HBase, or any HBase. It is a proprietary NoSQL datastore
>>> with (I gather) an Apache HBase compatible Java API.
>>>
>>> As for running HBase on EC2, we recently discussed some particulars, see
>>> the latter part of this thread: http://search-hadoop.com/m/rI1HpK90guwhere
>>> I hijack it. I wouldn't recommend launching HBase as part of an EMR flow
>>> unless you want to use it only for temporary random access storage, and
>> in
>>> which case use m2.2xlarge/m2.4xlarge instance types. Otherwise, set up a
>>> dedicated HBase backed storage service on high I/O instance types. The
>>> fundamental issue is IO performance on the EC2 platform is fair to poor.
>>>
>>> I have also noticed a large difference in baseline block device latency
>> if
>>> using an old Amazon Linux AMI (< 2013) or the latest AMIs from this year.
>>> Use the new ones, they cut the latency long tail in half. There were some
>>> significant kernel level improvements I gather.
>>>
>>>
>>> On Wed, May 8, 2013 at 10:42 AM, Marcos Luis Ortiz Valmaseda <
>>> [EMAIL PROTECTED]> wrote:
>>>
>>>> I think that you when you are talking about RMap, you are referring to
>>>> MapR´s distribution.
>>>> I think that MapR´s team released a very good version of its Hadoop
>>>> distribution focused on HBase called M7. You can see its overview here:
>>>> http://www.mapr.com/products/mapr-editions/m7-edition
>>>>
>>>> But this release was under beta testing, and I see that it´s not
>> included
>>>> in the Amazon Marketplace yet:
>> https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5
>>>>
>>>>
>>>>
>>>>
>>>> 2013/5/7 Pal Konyves <[EMAIL PROTECTED]>
>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone got some recommendations about running HBase on EC2? I am
>>>>> testing it, and so far I am very disappointed with it. I did not change
>>>>> anything about the default 'Amazon distribution' installation. It has
>> one
>>>>> MasterNode and two slave nodes, and write performance is around 2500
>>>> small
>>>>> rows per sec at most, but I expected it to be way  better. Oh, and this
>>>> is
>>>>> with batch put operations with autocommit turned off, where each batch
>>>>> containes about 500-1000 rows... When I do it with autocommit, it does
>>>> not
>>>>> even reach the 1000 rows per sec.
>>>>>
>>>>> Every nodes were m1.Large ones.
>>>>>
>>>>> Any experiences, suggestions? Is it worth to try the RMap distribution
>>>>> instead of the amazon one?
>>>>>
>