> Hi Adrien,
> I’ve tried to run hdfs fsck and hbase hbck, and hdfs is healthy, also
> hbase is consistent.
> I’m using default value of the replication, so it is 3.
> There are some under replicated
> HBase master(node 10.10.8.55) is reading constantly from regionservers.
> Only today, it send >150.000 HDFS_READ requests to each regionserver so
> far, while the hbase cluster is almost idle.
> What could cause this kind of behaviour?
>
> p.s. each node in the cluster have 2 core, 4 gb ram, just in case.
>
> Thanks.
>
>
> On 03 Sep 2015, at 17:46, Adrien Mogenet <[EMAIL PROTECTED]>
> wrote:
>
> Is your HDFS healthy (fsck /)?
>
> Same for hbase hbck?
>
> What's your replication level?
>
> Can you see constant network use as well?
>
> Anything than might be triggered by the hbasemaster? (something like a
> virtually dead RS, due to ZK race-condition, etc.)
>
> Your 3-weeks-ago balancer shouldn't have any effect if you've ran a major
> compaction, successfully, yesterday.
>
> On 3 September 2015 at 16:32, Akmal Abbasov <[EMAIL PROTECTED]>
> wrote:
>
>> I’ve started HDFS balancer, but then stopped it immediately after knowing
>> that it is not a good idea.
>> but it was around 3 weeks ago, is it possible that it had an influence on
>> the cluster behaviour I’m having now?
>> Thanks.
>>
>> On 03 Sep 2015, at 14:23, Akmal Abbasov <[EMAIL PROTECTED]> wrote:
>>
>> Hi Ted,
>> No there is no short-circuit read configured.
>> The logs of datanode of the 10.10.8.55 are full of following messages
>> 2015-09-03 12:03:56,324 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>> 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 77, op: HDFS_READ,
>> cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID:
>> ee7d0634-89a3-4ada-a8ad-7848214397be, blockid:
>> BP-439084760-10.32.0.180-1387281790961:blk_1075349331_1612273, duration:
>> 276448307
>> 2015-09-03 12:03:56,494 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>> 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 538, op: HDFS_READ,
>> cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID:
>> ee7d0634-89a3-4ada-a8ad-7848214397be, blockid:
>> BP-439084760-10.32.0.180-1387281790961:blk_1075349334_1612276, duration:
>> 60550244
>> 2015-09-03 12:03:59,561 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>> 10.10.8.55:50010, dest: /10.10.8.53:58622, bytes: 455, op: HDFS_READ,
>> cliID: DFSClient_NONMAPREDUCE_-483065515_1, offset: 0, srvID:
>> ee7d0634-89a3-4ada-a8ad-7848214397be, blockid:
>> BP-439084760-10.32.0.180-1387281790961:blk_1075351814_1614757, duration:
>> 755613819
>> There are >100.000 of them just for today. The situation with other
>> regionservers are similar.
>> Node 10.10.8.53 is hbase-master node, and the process on the port is also
>> hbase-master.
>> So if there is no load on the cluster, why there are so much IO happening?
>> Any thoughts.
>> Thanks.
>>
>> On 02 Sep 2015, at 21:57, Ted Yu <[EMAIL PROTECTED]> wrote:
>>
>> I assume you have enabled short-circuit read.
>>
>> Can you capture region server stack trace(s) and pastebin them ?
>>
>> Thanks
>>
>> On Wed, Sep 2, 2015 at 12:11 PM, Akmal Abbasov <[EMAIL PROTECTED]>
>> wrote:
>>
>>> Hi Ted,
>>> I’ve checked the time when addresses were changed, and this strange
>>> behaviour started weeks before it.
>>>
>>> yes, 10.10.8.55 is region server and 10.10.8.54 is a hbase master.
>>> any thoughts?
>>>
>>> Thanks
>>>
>>> On 02 Sep 2015, at 18:45, Ted Yu <[EMAIL PROTECTED]> wrote:
>>>
>>> bq. change the ip addresses of the cluster nodes
>>>
>>> Did this happen recently ? If high iowait was observed after the change
>>> (you can look at ganglia graph), there is a chance that the change was
>>> related.
>>>
>>> BTW I assume 10.10.8.55 <
http://10.10.8.55:50010/> is where your region