|
|
-
AsynchBase client holds stale dead region server for long time even after the META has already been update.
Tianying Chang 2013-01-25, 18:54
Hi
One machine crashed in our cluster. After 3 minutes, the master detect it and re-assign the regions to other region servers. The regions are back online on other RS within one minute. But the asynchbase client still hold old dead regionserver for 50 minutes and cause data loss. We have to restart the AsynchBase client and that fixed the problem.
It seems there is a bug in AsyncBase client code. Has anyone else seen this? If I want to open a bug for Asynchbase, should I use Hbase jira? or is there a dedicated one for Asynchbase? I seems cannot find dedicated AsynchBase jira.
Thanks Tian-Ying
+
Tianying Chang 2013-01-25, 18:54
-
Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Ted Yu 2013-01-25, 18:58
Tianying: I moved user@ to Cc.
There is a google group for asynchbase. Please subscribe to that group.
Can you clarify the version of asynchbase you're using ?
Cheers
On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang <[EMAIL PROTECTED]> wrote:
> Hi > > One machine crashed in our cluster. After 3 minutes, the master detect it > and re-assign the regions to other region servers. The regions are back > online on other RS within one minute. But the asynchbase client still hold > old dead regionserver for 50 minutes and cause data loss. We have to > restart the AsynchBase client and that fixed the problem. > > It seems there is a bug in AsyncBase client code. Has anyone else seen > this? If I want to open a bug for Asynchbase, should I use Hbase jira? or > is there a dedicated one for Asynchbase? I seems cannot find dedicated > AsynchBase jira. > > Thanks > Tian-Ying >
+
Ted Yu 2013-01-25, 18:58
-
RE: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Tianying Chang 2013-01-25, 19:12
Ted
it is 1.3.1
Thanks Tian-Ying ________________________________________ From: Ted Yu [[EMAIL PROTECTED]] Sent: Friday, January 25, 2013 10:58 AM To: Async HBase Cc: [EMAIL PROTECTED] Subject: Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Tianying: I moved user@ to Cc.
There is a google group for asynchbase. Please subscribe to that group.
Can you clarify the version of asynchbase you're using ?
Cheers
On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang <[EMAIL PROTECTED]> wrote:
> Hi > > One machine crashed in our cluster. After 3 minutes, the master detect it > and re-assign the regions to other region servers. The regions are back > online on other RS within one minute. But the asynchbase client still hold > old dead regionserver for 50 minutes and cause data loss. We have to > restart the AsynchBase client and that fixed the problem. > > It seems there is a bug in AsyncBase client code. Has anyone else seen > this? If I want to open a bug for Asynchbase, should I use Hbase jira? or > is there a dedicated one for Asynchbase? I seems cannot find dedicated > AsynchBase jira. > > Thanks > Tian-Ying >
+
Tianying Chang 2013-01-25, 19:12
-
Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Marcos Ortiz 2013-01-25, 21:29
Regards, Tianying AsynchBase is a StumbleUpon's open source project. You can find it on its GitHub's profile: https://github.com/stumbleupon/asynchbaseBest wishes On 01/25/2013 02:12 PM, Tianying Chang wrote: > Ted > > it is 1.3.1 > > Thanks > Tian-Ying > ________________________________________ > From: Ted Yu [[EMAIL PROTECTED]] > Sent: Friday, January 25, 2013 10:58 AM > To: Async HBase > Cc: [EMAIL PROTECTED] > Subject: Re: AsynchBase client holds stale dead region server for long time even after the META has already been update. > > Tianying: > I moved user@ to Cc. > > There is a google group for asynchbase. > Please subscribe to that group. > > Can you clarify the version of asynchbase you're using ? > > Cheers > > On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang <[EMAIL PROTECTED]> wrote: > >> Hi >> >> One machine crashed in our cluster. After 3 minutes, the master detect it >> and re-assign the regions to other region servers. The regions are back >> online on other RS within one minute. But the asynchbase client still hold >> old dead regionserver for 50 minutes and cause data loss. We have to >> restart the AsynchBase client and that fixed the problem. >> >> It seems there is a bug in AsyncBase client code. Has anyone else seen >> this? If I want to open a bug for Asynchbase, should I use Hbase jira? or >> is there a dedicated one for Asynchbase? I seems cannot find dedicated >> AsynchBase jira. >> >> Thanks >> Tian-Ying >> -- Marcos Ortiz Valmaseda, Technical Product Manager at UCI Blog: http://marcosluis2186.posterous.comTwitter: @marcosluis2186 < http://twitter.com/marcosluis2186>
+
Marcos Ortiz 2013-01-25, 21:29
-
RE: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Tianying Chang 2013-01-25, 23:31
Thanks Marcos. Can I file a bug there? Or at the googleGroup? -----Original Message----- From: Marcos Ortiz [mailto:[EMAIL PROTECTED]] Sent: Friday, January 25, 2013 1:30 PM To: [EMAIL PROTECTED] Cc: Tianying Chang; Async HBase Subject: Re: AsynchBase client holds stale dead region server for long time even after the META has already been update. Regards, Tianying AsynchBase is a StumbleUpon's open source project. You can find it on its GitHub's profile: https://github.com/stumbleupon/asynchbaseBest wishes On 01/25/2013 02:12 PM, Tianying Chang wrote: > Ted > > it is 1.3.1 > > Thanks > Tian-Ying > ________________________________________ > From: Ted Yu [[EMAIL PROTECTED]] > Sent: Friday, January 25, 2013 10:58 AM > To: Async HBase > Cc: [EMAIL PROTECTED] > Subject: Re: AsynchBase client holds stale dead region server for long time even after the META has already been update. > > Tianying: > I moved user@ to Cc. > > There is a google group for asynchbase. > Please subscribe to that group. > > Can you clarify the version of asynchbase you're using ? > > Cheers > > On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang <[EMAIL PROTECTED]> wrote: > >> Hi >> >> One machine crashed in our cluster. After 3 minutes, the master >> detect it and re-assign the regions to other region servers. The >> regions are back online on other RS within one minute. But the >> asynchbase client still hold old dead regionserver for 50 minutes and >> cause data loss. We have to restart the AsynchBase client and that fixed the problem. >> >> It seems there is a bug in AsyncBase client code. Has anyone else >> seen this? If I want to open a bug for Asynchbase, should I use Hbase >> jira? or is there a dedicated one for Asynchbase? I seems cannot find >> dedicated AsynchBase jira. >> >> Thanks >> Tian-Ying >> -- Marcos Ortiz Valmaseda, Technical Product Manager at UCI Blog: http://marcosluis2186.posterous.comTwitter: @marcosluis2186 < http://twitter.com/marcosluis2186>
+
Tianying Chang 2013-01-25, 23:31
-
Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Shrijeet Paliwal 2013-01-25, 19:01
This has been raised earlier https://groups.google.com/d/topic/asynchbase/xE2lYE6CbmQ/discussion , https://groups.google.com/d/topic/asynchbase/nfLTwjdqq9M/discussion . It does look like a bug but a hard one to reproduce. We have been seeing this it our production environment, efforts are on to reproduce this in testing environment. -- Shrijeet On Fri, Jan 25, 2013 at 10:58 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Tianying: > I moved user@ to Cc. > > There is a google group for asynchbase. > Please subscribe to that group. > > Can you clarify the version of asynchbase you're using ? > > Cheers > > On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang <[EMAIL PROTECTED]>wrote: > >> Hi >> >> One machine crashed in our cluster. After 3 minutes, the master detect it >> and re-assign the regions to other region servers. The regions are back >> online on other RS within one minute. But the asynchbase client still hold >> old dead regionserver for 50 minutes and cause data loss. We have to >> restart the AsynchBase client and that fixed the problem. >> >> It seems there is a bug in AsyncBase client code. Has anyone else seen >> this? If I want to open a bug for Asynchbase, should I use Hbase jira? or >> is there a dedicated one for Asynchbase? I seems cannot find dedicated >> AsynchBase jira. >> >> Thanks >> Tian-Ying >> > > -- > > >
+
Shrijeet Paliwal 2013-01-25, 19:01
-
RE: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Tianying Chang 2013-01-26, 01:28
Thanks Shrijeet Thanks for the information! We have seen this couple times recently. Last week, it was very long(like 40+ minutes before we restart). I will follow up on that discuss thread. Thanks a lot!! Tian-Ying -----Original Message----- From: Shrijeet Paliwal [mailto:[EMAIL PROTECTED]] Sent: Friday, January 25, 2013 11:02 AM To: Ted Yu Cc: Async HBase; [EMAIL PROTECTED] Subject: Re: AsynchBase client holds stale dead region server for long time even after the META has already been update. This has been raised earlier https://groups.google.com/d/topic/asynchbase/xE2lYE6CbmQ/discussion , https://groups.google.com/d/topic/asynchbase/nfLTwjdqq9M/discussion . It does look like a bug but a hard one to reproduce. We have been seeing this it our production environment, efforts are on to reproduce this in testing environment. -- Shrijeet On Fri, Jan 25, 2013 at 10:58 AM, Ted Yu <[EMAIL PROTECTED]> wrote: > Tianying: > I moved user@ to Cc. > > There is a google group for asynchbase. > Please subscribe to that group. > > Can you clarify the version of asynchbase you're using ? > > Cheers > > On Fri, Jan 25, 2013 at 10:54 AM, Tianying Chang <[EMAIL PROTECTED]>wrote: > >> Hi >> >> One machine crashed in our cluster. After 3 minutes, the master >> detect it and re-assign the regions to other region servers. The >> regions are back online on other RS within one minute. But the >> asynchbase client still hold old dead regionserver for 50 minutes and >> cause data loss. We have to restart the AsynchBase client and that fixed the problem. >> >> It seems there is a bug in AsyncBase client code. Has anyone else >> seen this? If I want to open a bug for Asynchbase, should I use Hbase >> jira? or is there a dedicated one for Asynchbase? I seems cannot find >> dedicated AsynchBase jira. >> >> Thanks >> Tian-Ying >> > > -- > > >
+
Tianying Chang 2013-01-26, 01:28
-
Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
tsuna 2013-01-26, 06:53
On Fri, Jan 25, 2013 at 5:28 PM, Tianying Chang <[EMAIL PROTECTED]> wrote: > Thanks for the information! We have seen this couple times recently. Last week, it was very long(like 40+ minutes before we restart). I will follow up on that discuss thread. Thanks a lot!! This is bug number 1, I haven't been able to track it down as I've never been able to reproduce it in a controller fashion :( https://github.com/OpenTSDB/asynchbase/issues/1I also spent hours manually walking references of heap dumps and checking state to see if anything was wrong but I haven't found anything, not even a clue. -- Benoit "tsuna" Sigoure
+
tsuna 2013-01-26, 06:53
-
Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
ishan chhabra 2013-01-29, 17:43
Hi Tsuna, As Shrijeet mentioned, we (@Rocketfuel) were experiencing this bug internally when doing cluster restarts. After some trial and error, I was able to create a set of steps to reproduce this bug in a controlled fashion on our test cluster. Further, using heap dumps and added debug messages, this looks like the cause and fix: https://github.com/OpenTSDB/asynchbase/pull/48. I have tested this repeatedly on the test cluster and things are looking fine. Please have a look and see if this makes sense and if the fix is a correct one. Cheers, Ishan On Friday, 25 January 2013 22:53:17 UTC-8, tsuna wrote: > > On Fri, Jan 25, 2013 at 5:28 PM, Tianying Chang <[EMAIL PROTECTED]<javascript:>> > wrote: > > Thanks for the information! We have seen this couple times recently. > Last week, it was very long(like 40+ minutes before we restart). I will > follow up on that discuss thread. Thanks a lot!! > > This is bug number 1, I haven't been able to track it down as I've > never been able to reproduce it in a controller fashion :( > https://github.com/OpenTSDB/asynchbase/issues/1 > > I also spent hours manually walking references of heap dumps > and checking state to see if anything was wrong but I haven't > found anything, not even a clue. > > -- > Benoit "tsuna" Sigoure >
+
ishan chhabra 2013-01-29, 17:43
-
Re: AsynchBase client holds stale dead region server for long time even after the META has already been update.
Marcos Ortiz 2013-01-29, 14:07
Great to hear, Ishan. We faced a similar error here. We will test this with the fix that you propose. Best wishes On 01/29/2013 12:43 PM, ishan chhabra wrote: > Hi Tsuna, > As Shrijeet mentioned, we (@Rocketfuel) were experiencing this bug > internally when doing cluster restarts. After some trial and error, I > was able to create a set of steps to reproduce this bug in a > controlled fashion on our test cluster. Further, using heap dumps and > added debug messages, this looks like the cause and fix: > https://github.com/OpenTSDB/asynchbase/pull/48. I have tested this > repeatedly on the test cluster and things are looking fine. Please > have a look and see if this makes sense and if the fix is a correct one. > > Cheers, > Ishan > > On Friday, 25 January 2013 22:53:17 UTC-8, tsuna wrote: > > On Fri, Jan 25, 2013 at 5:28 PM, Tianying Chang <[EMAIL PROTECTED] > <javascript:>> wrote: > > Thanks for the information! We have seen this couple times > recently. Last week, it was very long(like 40+ minutes before we > restart). I will follow up on that discuss thread. Thanks a lot!! > > This is bug number 1, I haven't been able to track it down as I've > never been able to reproduce it in a controller fashion :( > https://github.com/OpenTSDB/asynchbase/issues/1> < https://github.com/OpenTSDB/asynchbase/issues/1>> > I also spent hours manually walking references of heap dumps > and checking state to see if anything was wrong but I haven't > found anything, not even a clue. > > -- > Benoit "tsuna" Sigoure > -- Marcos Ortiz Valmaseda, Product Manager && Data Scientist at DATEC Blog: http://marcosluis2186.posterous.comTwitter: @marcosluis2186 < http://twitter.com/marcosluis2186>
+
Marcos Ortiz 2013-01-29, 14:07
|
|