Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Performance tuning


Copy link to this message
-
Re: Performance tuning
There are quite a lot of established and time wait connections between the
RS on port 50010, but i dont know a good way of monitoring how much data is
going through each connection (if that's what you meant)?
On Sun, Dec 22, 2013 at 12:00 AM, Kristoffer Sjögren <[EMAIL PROTECTED]>wrote:

> Scans on RS 19 and 23, which have 5 regions instead of 4, stands out more
> than scans on RS 20, 21, 22. But scans on RS 7 and 18, that also have 5
> regions are doing fine, not best, but still in the mid-range.
>
>
> On Sat, Dec 21, 2013 at 11:51 PM, Kristoffer Sjögren <[EMAIL PROTECTED]>wrote:
>
>> Yeah, im doing a count(*) query on the 96 region table. Do you mean to
>> check network traffic between RS?
>>
>> From debugging phoenix code I can see that there are 96 scans sent and
>> each response returned back to the client contain only the sum of rows,
>> which are then aggregated and returned. So the traffic between client and
>> each RS is very small.
>>
>>
>>
>>
>> On Sat, Dec 21, 2013 at 11:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>
>>> Thanks Kristoffer,
>>>
>>> yeah, that's the right metric. I would put my bet on the slower network.
>>> But you're also doing a select count(*) query in Phoenix, right? So
>>> nothing should really be sent across the network.
>>>
>>> When you do the queries, can you check whether there is any network
>>> traffic?
>>>
>>> -- Lars
>>>
>>>
>>>
>>> ________________________________
>>>  From: Kristoffer Sjögren <[EMAIL PROTECTED]>
>>> To: [EMAIL PROTECTED]; lars hofhansl <[EMAIL PROTECTED]>
>>> Sent: Saturday, December 21, 2013 1:28 PM
>>> Subject: Re: Performance tuning
>>>
>>>
>>> @pradeep scanner caching should not be an issue since data transferred to
>>> the client is tiny.
>>>
>>> @lars Yes, the data might be small for this particular case :-)
>>>
>>> I have checked everything I can think of on RS (CPU, network, Hbase
>>> console, uptime etc) and nothing stands out, except for the pings
>>> (network
>>> pings).
>>> There are 5 regions on 7, 18, 19, and 23 the others have 4.
>>> hdfsBlocksLocalityIndex=100 on all RS (was that the correct metric?)
>>>
>>> -Kristoffer
>>>
>>>
>>>
>>>
>>> On Sat, Dec 21, 2013 at 9:44 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>>>
>>> > Hi Kristoffer,
>>> > For this particular problem. Are many regions on the same
>>> RegionServers?
>>> > Did you profile those RegionServers? Anything weird on that box?
>>> > Pings slower might well be an issue. How's the data locality? (You can
>>> > check on a RegionServer's overview page).
>>> > If needed, you can issue a major compaction to reestablish local data
>>> on
>>> > all RegionServers.
>>> >
>>> >
>>> > 32 cores matched with only 4G of RAM is a bit weird, but with your tiny
>>> > dataset it doesn't matter anyway.
>>> >
>>> > 10m rows across 96 regions is just about 100k rows per region. You
>>> won't
>>> > see many of the nice properties for HBase.
>>> > Try with 100m (or better 1bn rows). Then we're talking. For anything
>>> below
>>> > this you wouldn't want to use HBase anyway.
>>> > (100k rows I could scan on my phone with a Perl script in less than 1s)
>>> >
>>> >
>>> > With "ping" you mean an actual network ping, or some operation on top
>>> of
>>> > HBase?
>>> >
>>> >
>>> > -- Lars
>>> >
>>> >
>>> >
>>> > ________________________________
>>> >  From: Kristoffer Sjögren <[EMAIL PROTECTED]>
>>> > To: [EMAIL PROTECTED]
>>> > Sent: Saturday, December 21, 2013 11:17 AM
>>> > Subject: Performance tuning
>>> >
>>> >
>>> > Hi
>>> >
>>> > I have been performance tuning HBase 0.94.6 running Phoenix 2.2.0 the
>>> last
>>> > couple of days and need some help.
>>> >
>>> > Background.
>>> >
>>> > - 23 machine cluster, 32 cores, 4GB heap per RS.
>>> > - Table t_24 have 24 online regions (24 salt buckets).
>>> > - Table t_96 have 96 online regions (96 salt buckets).
>>> > - 10.5 million rows per table.
>>> > - Count query - select (*) from ...
>>> > - Group by query - select A, B, C sum(D) from ... where (A = 1 and T
>>> >= 0