Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> HBase scan performance decreases over time.


+
David Koch 2012-11-03, 15:12
+
Ted Yu 2012-11-03, 15:42
+
David Koch 2012-11-03, 19:50
+
Michael Segel 2012-11-05, 13:04
+
Asaf Mesika 2012-11-05, 18:14
Copy link to this message
-
Re: HBase scan performance decreases over time.
There is property dfs.balance.bandwidthPerSec in hdfs-site.xml
 <property>
    <name>dfs.balance.bandwidthPerSec</name>
    <value>6250000</value>
    <description>
        Specifies the maximum amount of bandwidth that each datanode
        can utilize for the balancing purpose in term of
        the number of bytes per second.
  </description>
  </property>
Thank you!

Sincerely,
Leonid Fedotov
On Nov 5, 2012, at 10:14 AM, Asaf Mesika wrote:

> Where is this settings located?
>
> Sent from my iPhone
>
> On 5 בנוב 2012, at 15:05, Michael Segel <[EMAIL PROTECTED]> wrote:
>
>> There's an HDFS bandwidth setting which is set to 10MB/s.
>>
>> Way too low for even 1GBe.
>>
>> Have you modified this setting yet?
>>
>> -Mike
>>
>> On Nov 3, 2012, at 2:50 PM, David Koch <[EMAIL PROTECTED]> wrote:
>>
>>> Hello Ted,
>>>
>>> We never initiate major compaction manually. I have not looked at I/O
>>> balance between nodes in detail. We have noticed that after running for a
>>> couple of weeks HBase seems to spend hours pushing blocks between nodes in
>>> order to optimize things. We add data daily in one ~30gb push to several
>>> tables. Sometimes nodes get added to the running system.
>>>
>>> Where can I get more information on how to carry out performance related
>>> HBase administrative tasks?
>>>
>>> Thank you,
>>>
>>> /David
>>>
>>>
>>> On Sat, Nov 3, 2012 at 4:42 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>>> Can you tell us how often you run major compaction after the import ?
>>>> Have you noticed imbalanced read / write requests in the cluster ? Meaning
>>>> subset of region servers receive bulk of the writes.
>>>>
>>>> We do some manual movement of regions when the above happens.
>>>>
>>>> Cheers
>>>>
>>>> On Sat, Nov 3, 2012 at 8:12 AM, David Koch <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Every now and then we need to flatten our cluster and re-import all data
>>>>> from log files (changes in data format, etc.) Afterwards we notice a
>>>>> significant increase in scan performance. As data is added and shuffled
>>>>> around between region servers, performance goes down again over time
>>>> (say a
>>>>> couple of weeks). Are there any routine operations that one should run
>>>>> manually, or settings to activate in the HBase configuration to keep the
>>>>> data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
>>>>>
>>>>> Thank you,
>>>>>
>>>>> /David
>>

+
Michael Segel 2012-11-05, 18:49
+
Ted Yu 2012-11-03, 20:14