Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> help on key design

Copy link to this message
Re: help on key design

You split the region that is hot. What's to stop all of the keys that the OP wants are still within the same region?  Not to mention... how do you control which region is on which region server?  

Just food for thought.

If the OP is doing get()s, then he may want to consider taking the hash, truncating it to 4 bytes and prepending it to his key.  This should give him some randomness.

On Jul 31, 2013, at 1:57 PM, Pablo Medina <[EMAIL PROTECTED]> wrote:

> If you split that one hot region and then move a half to another region
> server then you will move the half of the load of that hot region server.
> The set of hot keys then will be spread over 2 region servers instead of
> one.
> 2013/7/31 Michael Segel <[EMAIL PROTECTED]>
>> 4 regions on 3 servers?
>> I'd say that they were already balanced.
>> The issue is that when they do their get(s) they are hitting one region.
>> So more splits isn't the answer.
>> On Jul 31, 2013, at 12:49 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>> From the information Demian provided in the first email:
>>> bq. a table containing 20 million keys splitted automatically by HBase
>> in 4
>>> regions and balanced in 3 region servers
>>> I think the number of regions should be increased through (manual)
>>> splitting so that the data is spread more evenly across servers.
>>> If the Get's are scattered across whole key space, there is some
>>> optimization the client can do. Namely group the Get's by region boundary
>>> and issue multi get per region.
>>> Please also refer to http://hbase.apache.org/book.html#rowkey.design,
>>> especially 6.3.2.
>>> Cheers
>>> On Wed, Jul 31, 2013 at 10:14 AM, Dhaval Shah
>>> <[EMAIL PROTECTED]>wrote:
>>>> Looking at https://issues.apache.org/jira/browse/HBASE-6136 it seems
>> like
>>>> the 500 Gets are executed sequentially on the region server.
>>>> Also 3k requests per minute = 50 requests per second. Assuming your
>>>> requests take 1 sec (which seems really long but who knows) then you
>> need
>>>> atleast 50 threads/region server handlers to handle these. Defaults for
>>>> that number on some older versions of hbase is 10 which means you are
>>>> running out of threads. Which brings up the following questions -
>>>> What version of HBase are you running?
>>>> How many region server handlers do you have?
>>>> Regards,
>>>> Dhaval
>>>> ----- Original Message -----
>>>> From: Demian Berjman <[EMAIL PROTECTED]>
>>>> Cc:
>>>> Sent: Wednesday, 31 July 2013 11:12 AM
>>>> Subject: Re: help on key design
>>>> Thanks for the responses!
>>>>> why don't you use a scan
>>>> I'll try that and compare it.
>>>>> How much memory do you have for your region servers? Have you enabled
>>>>> block caching? Is your CPU spiking on your region servers?
>>>> Block caching is enabled. Cpu and memory dont seem to be a problem.
>>>> We think we are saturating a region because the quantity of keys
>> requested.
>>>> In that case my question will be if asking 500+ keys per request is a
>>>> normal scenario?
>>>> Cheers,
>>>> On Wed, Jul 31, 2013 at 11:24 AM, Pablo Medina <[EMAIL PROTECTED]
>>>>> wrote:
>>>>> The scan can be an option if the cost of scanning undesired cells and
>>>>> discarding them trough filters is better than accessing those keys
>>>>> individually. I would say that as the number of 'undesired' cells
>>>> decreases
>>>>> the scan overall performance/efficiency gets increased. It all depends
>> on
>>>>> how the keys are designed to be grouped together.
>>>>> 2013/7/30 Ted Yu <[EMAIL PROTECTED]>
>>>>>> Please also go over http://hbase.apache.org/book.html#perf.reading
>>>>>> Cheers
>>>>>> On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah <
>>>>>>> wrote:
>>>>>>> If all your keys are grouped together, why don't you use a scan with