Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> EC2 Elastic MapReduce HBase install recommendations


Copy link to this message
-
Re: EC2 Elastic MapReduce HBase install recommendations
High collision rate means high contention at taking the row locks.
This results in poor write performance.

Cheers

On May 11, 2013, at 7:14 PM, Pal Konyves <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I decided not to make any tuning, because my whole project is about
> experimenting with HBase (it's a scool project). However it turned out that
> my sample data generated lots of rowkey collisions. 4 million inserts only
> resulted in about 5000 rows. The data were different though in the columns.
> When I changed my sample dataset to have no collisions in the rowkey, the
> performance increased with a magnitude of 10. Why is that?
>
> Thanks,
> Pal
>
>
> On Thu, May 9, 2013 at 2:32 PM, Michel Segel <[EMAIL PROTECTED]>wrote:
>
>> What I am saying is that by default, you get two mappers per node.
>> x4large can run HBase w more mapred slots, so you will want to tune the
>> defaults based on machine size. Not just mapred, but also HBase stuff too.
>> You need to do this on startup of EMR cluster though...
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On May 9, 2013, at 2:39 AM, Pal Konyves <[EMAIL PROTECTED]> wrote:
>>
>>> Principally I chose to use Amazon, because they are supposedly high
>>> performance, and what more important is: HBase is already set up if I
>> chose
>>> it as an EMR Workflow. I wanted to save up the time setting up the
>> cluster
>>> manually on EC2 instances.
>>>
>>> Are you saying I will reach higher performance when I set up the HBase on
>>> the cluster manually, instead of the default Amazon HBase distribution?
>> Or
>>> is it worth to tune the Amazon distribution with a bootstrap action? How
>>> long does it take, to set up the cluster with HDFS manually?
>>>
>>> I will also try larger instance types.
>>>
>>>
>>> On Thu, May 9, 2013 at 6:47 AM, Michel Segel <[EMAIL PROTECTED]
>>> wrote:
>>>
>>>> With respect to EMR, you can run HBase fairly easily.
>>>> You can't run MapR w HBase on EMR stick w Amazon's release.
>>>>
>>>> And you can run it but you will want to know your tuning parameters up
>>>> front when you instantiate it.
>>>>
>>>>
>>>>
>>>> Sent from a remote device. Please excuse any typos...
>>>>
>>>> Mike Segel
>>>>
>>>> On May 8, 2013, at 9:04 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> M7 is not Apache HBase, or any HBase. It is a proprietary NoSQL
>> datastore
>>>>> with (I gather) an Apache HBase compatible Java API.
>>>>>
>>>>> As for running HBase on EC2, we recently discussed some particulars,
>> see
>>>>> the latter part of this thread:
>> http://search-hadoop.com/m/rI1HpK90guwhere
>>>>> I hijack it. I wouldn't recommend launching HBase as part of an EMR
>> flow
>>>>> unless you want to use it only for temporary random access storage, and
>>>> in
>>>>> which case use m2.2xlarge/m2.4xlarge instance types. Otherwise, set up
>> a
>>>>> dedicated HBase backed storage service on high I/O instance types. The
>>>>> fundamental issue is IO performance on the EC2 platform is fair to
>> poor.
>>>>>
>>>>> I have also noticed a large difference in baseline block device latency
>>>> if
>>>>> using an old Amazon Linux AMI (< 2013) or the latest AMIs from this
>> year.
>>>>> Use the new ones, they cut the latency long tail in half. There were
>> some
>>>>> significant kernel level improvements I gather.
>>>>>
>>>>>
>>>>> On Wed, May 8, 2013 at 10:42 AM, Marcos Luis Ortiz Valmaseda <
>>>>> [EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> I think that you when you are talking about RMap, you are referring to
>>>>>> MapR´s distribution.
>>>>>> I think that MapR´s team released a very good version of its Hadoop
>>>>>> distribution focused on HBase called M7. You can see its overview
>> here:
>>>>>> http://www.mapr.com/products/mapr-editions/m7-edition
>>>>>>
>>>>>> But this release was under beta testing, and I see that it´s not
>>>> included
>>>>>> in the Amazon Marketplace yet:
>> https://aws.amazon.com/marketplace/seller-profile?id=802b0a25-877e-4b57-9007-a3fd284815a5