Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Errors after major compaction


Copy link to this message
-
Re: Errors after major compaction
Ted,
So if I understand correctly the the theory is that because of the issue
fixed in HBASE-3789 the master took too long to detect that the region was
successfully opened by the first server so it forced closed it and
transitioned to a second server, but there are a few things about this
scenario I don't understand, probably because I don't know enough about the
inner workings of the region transition process and would appreciate it if
you can help me understand:
1. The RS opened the region at 16:37:49.
2. The master started handling the opened event at 16:39:54 - this delay can
probably be explained by HBASE-3789
3. At 16:39:54 the master log says: Opened region gs_raw_events,..... on
hadoop1-s05.farm-ny.gigya.com
4. Then at 16:40:00 the master log says: master:60000-0x13004a31d7804c4
Creating (or updating) unassigned node for 584dac5cc70d8682f71c4675a843c3
09 with OFFLINE state - why did it decide to take the region offline after
learning it was successfully opened?
5. Then it tries to reopen the region on hadoop1-s05, which indicates in its
log that the open request failed because the region was already open - why
didn't the master use that information to learn that the region was already
open?
6. At 16:43:57 the master decides the region transition timed out and starts
forcing the transition - HBASE-3789 again?
7. Now the master forces the transition of the region to hadoop1-s02 but
there is no sign of that on hadoop1-s05 - why doesn't the old RS
(hadoop1-s05) detect that it is no longer the master and relinquishes
control of the region?

Thanks.

-eran

On Sun, Jul 3, 2011 at 20:09, Ted Yu <[EMAIL PROTECTED]> wrote:

> HBASE-3789 should have sped up region assignment.
> The patch for 0.90 is attached to that JIRA.
>
> You may prudently apply that patch.
>
> Regards
>
> On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
>
> > Thanks Ted, but, as stated before, I'm already using 0.90.3, so either
> it's
> > not fixed or it's not the same thing.
> >
> > -eran
> >
> >
> >
> > On Sun, Jul 3, 2011 at 17:27, Ted Yu <[EMAIL PROTECTED]> wrote:
> >
> > > Eran:
> > > I was thinking of this:
> > > HBASE-3789  Cleanup the locking contention in the master
> > >
> > > though it doesn't directly handle 'PENDING_OPEN for too long' case.
> > >
> > > https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 and
> > actually
> > > close to the symptom you described.
> > >
> > > On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner <[EMAIL PROTECTED]> wrote:
> > >
> > > > It does seem that both servers opened the same region around the same
> > > time.
> > > > The region was offline because I disabled the table so I can change
> its
> > > > TTL.
> > > >
> > > > Here is the log from haddop1-s05 :
> > > > 2011-06-29 16:37:12,576 INFO
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request
> to
> > > > open
> > > > region:
> > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > 2011-06-29 16:37:12,680 DEBUG
> > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler:
> > > Processing
> > > > open of
> > > >
> > > >
> > >
> >
> gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
> > > > 2011-06-29 16:37:12,680 DEBUG
> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > regionserver:60020-0x33004a38816050b Attempting to transition node
> > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > > RS_ZK_REGION_OPENING
> > > > 2011-06-29 16:37:12,711 DEBUG
> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > regionserver:60020-0x33004a38816050b Successfully transitioned node
> > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to
> > > > RS_ZK_REGION_OPENING
> > > > 2011-06-29 16:37:12,711 DEBUG
> > > org.apache.hadoop.hbase.regionserver.HRegion:
> > > > Opening region: REGION => {NAME =>
> > > >
> > > >
> > >
> >
> 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.',
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB