-Re: How often should we reboot hbase cluster---looking for best practice
Good to know. Could you also comment on how different your internal
tool is from the new HBCK
(https://issues.apache.org/jira/browse/HBASE-5128 as Ted pointed out
Would also be good if you can share your logs with us for the latter
"every week" cases.
On Mon, May 28, 2012 at 11:50 PM, Xu, Richard <[EMAIL PROTECTED]> wrote:
> Overlapping regions (https://issues.apache.org/jira/browse/HBASE-4238) do not show up very often. We know that it is fixed after 0.90.5, but instead of upgrading the production hbase cluster, we have an internal tool (call Hbase APIs) to fix it.
> Regions out of sync (between META and HDFS) and Inactive regions appear more often --- we can see them every week; again, our internal tool handles these cases as well.
> -----Original Message-----
> From: Kevin O'dell [mailto:[EMAIL PROTECTED]]
> Sent: Monday, May 28, 2012 1:45 PM
> To: [EMAIL PROTECTED]
> Subject: Re: How often should we reboot hbase cluster---looking for best practice
> +1 what Harsh said. It sounds to me like you are putting a bandaid on a
> flesh wound. We should do further analysis and get your cluster to a
> stable state rather than repairing it weekly. Can you also describe in
> more detail everything you are running to do the repair in 90.4 fixing an
> overlapping region is not an easy task by any means.
> On Mon, May 28, 2012 at 9:14 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> If you're talking of "hbck -fix", then no you don't need to restart
>> HBase after it resolves your issues.
>> Would be good to investigate/know what causes such frequent
>> inconsistencies in your cluster though. Its not normal for
>> inconsistencies to appear regularly every week. Do your region servers
>> often crash weekly, for instance?
>> On Mon, May 28, 2012 at 9:35 PM, Xu, Richard <[EMAIL PROTECTED]> wrote:
>> > Hi folks,
>> > It is more like an operation question.
>> > Hbase version is 0.90.4, we have a weekly job to fix known issues such
>> as META table out of sync, inactive/overlapping/dangling regions while
>> hbase is online.
>> > Should we restart the hbase cluster right after the fix? What is the
>> best practice here?
>> > Thanks in advance!
>> > Richard
>> Harsh J
> Kevin O'Dell
> Customer Operations Engineer, Cloudera