Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Errors after major compaction


Copy link to this message
-
Re: Errors after major compaction
Ted,
So if I understand correctly the the theory is that because of the issue
fixed in HBASE-3789 the master took too long to detect that the region was
successfully opened by the first server so it forced closed it and
transitioned to a second server, but there are a few things about this
scenario I don't understand, probably because I don't know enough about the
inner workings of the region transition process and would appreciate it if
you can help me understand:
1. The RS opened the region at 16:37:49.
2. The master started handling the opened event at 16:39:54 - this delay can
probably be explained by HBASE-3789
3. At 16:39:54 the master log says: Opened region gs_raw_events,..... on
hadoop1-s05.farm-ny.gigya.com
4. Then at 16:40:00 the master log says: master:60000-0x13004a31d7804c4
Creating (or updating) unassigned node for 584dac5cc70d8682f71c4675a843c3
09 with OFFLINE state - why did it decide to take the region offline after
learning it was successfully opened?
5. Then it tries to reopen the region on hadoop1-s05, which indicates in its
log that the open request failed because the region was already open - why
didn't the master use that information to learn that the region was already
open?
6. At 16:43:57 the master decides the region transition timed out and starts
forcing the transition - HBASE-3789 again?
7. Now the master forces the transition of the region to hadoop1-s02 but
there is no sign of that on hadoop1-s05 - why doesn't the old RS
(hadoop1-s05) detect that it is no longer the master and relinquishes
control of the region?

Thanks.

-eran

On Sun, Jul 3, 2011 at 20:09, Ted Yu <[EMAIL PROTECTED]> wrote:

> HBASE-3789 should have sped up region assignment.
> The patch for 0.90 is attached to that JIRA.
>
> You may prudently apply that patch.
>
> Regards
>
> On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> > Thanks Ted, but, as stated before, I'm already using 0.90.3, so either
> it's
> > not fixed or it's not the same thing.
> >
> > -eran
> >
> >
> >
> > On Sun, Jul 3, 2011 at 17:27, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Eran:
> > > I was thinking of this:
> > > HBASE-3789  Cleanup the locking contention in the master
> > >
> > > though it doesn't directly handle 'PENDING_OPEN for too long' case.
> > >
> > > https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 and
> > actually
> > > close to the symptom you described.
> > >
> > > On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > >
> > > > It does seem that both servers opened the same region around the same
> > > time.
> > > > The region was offline because I disabled the table so I can change
> its
> > > > TTL.
> > > >
> > > > Here is the log from haddop1-s05 :
> > > > 2011-06-29 16:37:12,576 INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request
> to
> > > > open
> > > > region:
> > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > 2011-06-29 16:37:12,680 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> > > Processing
> > > > open of
> > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > 2011-06-29 16:37:12,680 DEBUG
> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > regionserver:60020-0x33004a38816050b Attempting to transition node
> > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > > RS_ZK_REGION_OPENING
> > > > 2011-06-29 16:37:12,711 DEBUG
> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > regionserver:60020-0x33004a38816050b Successfully transitioned node
> > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > > RS_ZK_REGION_OPENING
> > > > 2011-06-29 16:37:12,711 DEBUG
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Opening region: REGION => {NAME =>
> > > >
> > > >
> > >
> >
> 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.',