|
Howard
2012-07-17, 12:09
Ted Yu
2012-07-17, 13:43
Howard
2012-07-17, 14:01
Ramkrishna.S.Vasudevan
2012-07-17, 14:30
Ted Yu
2012-07-17, 15:04
Howard
2012-07-18, 02:11
Suraj Varma
2012-07-18, 23:06
Howard
2012-07-27, 09:43
Ted Yu
2012-07-27, 13:49
deanforwever2010
2012-07-30, 05:46
yuzhihong@...
2012-07-30, 07:50
deanforwever2010
2012-07-30, 10:26
Howard
2012-07-30, 08:27
|
-
Load balancer repeatedly close and open region in the same regionserver.Howard 2012-07-17, 12:09
Through the master and regionserver log,I find load balancer repeatedly
close and open region in the same regionserver(period in hbase.balancer.period ). Does this is a bug in load balancer and how can I dig into or avoid this? The log like this: 2012-07-16 00:12:50,055 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_status_list,zOwMwdi,1342294447440.7608037379153932a30f4bdd6d7edf5d. 2012-07-16 00:12:50,069 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_comment_list,zO7vqv1,1342192347676.0c9036e183055ff2df93eca348aeeada. 2012-07-16 00:12:50,206 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_status_list,zOzYDL9,1342294447440.47eb162f6aecffcddfa897fdbb70472b. 2012-07-16 00:12:50,301 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_status_list,zOM1HZP,1342293292587.9af5de3db0bad0470d9b8fe6b9eadf14. 2012-07-16 00:12:50,428 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_status_list,zOPKLDi,1342293292587.f89dbfc1dad366c991eba3c248228637. 2012-07-16 00:12:50,562 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. 2012-07-16 00:12:50,685 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_status_list,zOIR8Tw,1342283152067.6284e9f69042266cde88a7b19071d701. 2012-07-16 00:12:50,791 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: trackurl_comment_list,zOqzaNw,1342195266309.bd8e19c1063f4f2984f9ca63365c5573. +
Howard 2012-07-17, 12:09
-
Re: Load balancer repeatedly close and open region in the same regionserver.Ted Yu 2012-07-17, 13:43
Which HBase version were you using ?
The log messages below were in very short time period. Can you provide more logs ? Thanks On Tue, Jul 17, 2012 at 5:09 AM, Howard <[EMAIL PROTECTED]> wrote: > Through the master and regionserver log,I find load balancer repeatedly > close and open region in the same regionserver(period in > hbase.balancer.period ). > Does this is a bug in load balancer and how can I dig into or avoid this? > The log like this: > 2012-07-16 00:12:50,055 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_status_list,zOwMwdi,1342294447440.7608037379153932a30f4bdd6d7edf5d. > 2012-07-16 00:12:50,069 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_comment_list,zO7vqv1,1342192347676.0c9036e183055ff2df93eca348aeeada. > 2012-07-16 00:12:50,206 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_status_list,zOzYDL9,1342294447440.47eb162f6aecffcddfa897fdbb70472b. > 2012-07-16 00:12:50,301 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_status_list,zOM1HZP,1342293292587.9af5de3db0bad0470d9b8fe6b9eadf14. > 2012-07-16 00:12:50,428 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_status_list,zOPKLDi,1342293292587.f89dbfc1dad366c991eba3c248228637. > 2012-07-16 00:12:50,562 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > 2012-07-16 00:12:50,685 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_status_list,zOIR8Tw,1342283152067.6284e9f69042266cde88a7b19071d701. > 2012-07-16 00:12:50,791 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to > open region: > > trackurl_comment_list,zOqzaNw,1342195266309.bd8e19c1063f4f2984f9ca63365c5573. > +
Ted Yu 2012-07-17, 13:43
-
Re: Load balancer repeatedly close and open region in the same regionserver.Howard 2012-07-17, 14:01
*hi,Ted Yu,thanks for your reply:*
the hbase and hadoop version is HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 the following is a detail log about the same region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956, and it repeats again and again.: 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956., src=192.168.1.2,60020,1342017399608, dest=192.168.1.2,60020,1342002082592 2012-07-16 00:12:49,843 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. (offlining) 2012-07-16 00:12:49,843 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x4384d0a47f40068 Creating unassigned node for 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state 2012-07-16 00:12:49,845 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 192.168.1.2,60020,1342017399608 for region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. 2012-07-16 00:12:50,555 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, region=93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:12:50,555 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:12:50,555 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. state=CLOSED, ts=1342368770556, server=192.168.1.2,60020,1342017399608 2012-07-16 00:12:50,555 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x4384d0a47f40068 Creating (or updating) unassigned node for 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state 2012-07-16 00:12:50,558 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=10.75.18.34,60000,1342017369575, region=93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:12:50,558 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. destination server is 192.168.1.2,60020,1342002082592 2012-07-16 00:12:50,558 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956.; plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956., src=192.168.1.2,60020,1342017399608, dest=192.168.1.2,60020,1342002082592 2012-07-16 00:12:50,558 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. to 192.168.1.2,60020,1342002082592 2012-07-16 00:12:50,574 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=192.168.1.2,60020,1342017399608, region=93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:12:50,635 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=192.168.1.2,60020,1342017399608, region=93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:12:50,639 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=192.168.1.2,60020,1342017399608, region=93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:12:50,639 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. from 192.168.1.2,60020,1342017399608; deleting unassigned node 2012-07-16 00:12:50,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x4384d0a47f40068 Deleting existing unassigned node for 93caf5147d40f5dd4625e160e1b7e956 that is in expected state RS_ZK_REGION_OPENED 2012-07-16 00:12:50,641 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. has been deleted. 2012-07-16 00:12:50,641 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x4384d0a47f40068 Successfully deleted unassigned node for region 93caf5147d40f5dd4625e160e1b7e956 in expected state RS_ZK_REGION_OPENED 2012-07-16 00:12:50,641 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. that was online on 192.168.1.2,60020,1342017399608 2012-07-16 00:17:49,870 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956., src=192.168.1.2,60020,1342017399608, dest=192.168.1.2,60020,1342002082592 2012-07-16 00:17:49,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. (offlining) 2012-07-16 00:17:49,870 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x4384d0a47f40068 Creating unassigned node for 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state 2012-07-16 00:17:49,872 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 192.168.1.2,60020,1342017399608 for region trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. 2012-07-16 00:17:50,464 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, region=93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:17:50,464 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 93caf5147d40f5dd4625e160e1b7e956 2012-07-16 00:17:50,464 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=trackurl_ +
Howard 2012-07-17, 14:01
-
RE: Load balancer repeatedly close and open region in the same regionserver.Ramkrishna.S.Vasudevan 2012-07-17, 14:30
>From the logs I can see that though the server's are same their start code
is different. Need to analyse the previous logs also. Pls file a JIRA, if possible attach the logs to that. Thanks Howard. Regards Ram > -----Original Message----- > From: Howard [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, July 17, 2012 7:32 PM > To: [EMAIL PROTECTED] > Subject: Re: Load balancer repeatedly close and open region in the same > regionserver. > > *hi,Ted Yu,thanks for your reply:* > the hbase and hadoop version is > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > the following is a detail log about the same region > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > e956, > and it repeats again and again.: > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: > balance > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > e1b7e956., > src=192.168.1.2,60020,1342017399608, > dest=192.168.1.2,60020,1342002082592 > 2012-07-16 00:12:49,843 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment > of > region > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > e956. > (offlining) > 2012-07-16 00:12:49,843 DEBUG > org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x4384d0a47f40068 Creating unassigned node for > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > 2012-07-16 00:12:49,845 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > 192.168.1.2,60020,1342017399608 for region > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > e956. > 2012-07-16 00:12:50,555 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, > region=93caf5147d40f5dd4625e160e1b7e956 > 2012-07-16 00:12:50,555 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling > CLOSED > event for 93caf5147d40f5dd4625e160e1b7e956 > 2012-07-16 00:12:50,555 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > e1b7e956. > state=CLOSED, ts=1342368770556, server=192.168.1.2,60020,1342017399608 > 2012-07-16 00:12:50,555 DEBUG > org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned node > for > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > 2012-07-16 00:12:50,558 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=M_ZK_REGION_OFFLINE, server=10.75.18.34,60000,1342017369575, > region=93caf5147d40f5dd4625e160e1b7e956 > 2012-07-16 00:12:50,558 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing > plan > for > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > e956. > destination server is 192.168.1.2,60020,1342002082592 > 2012-07-16 00:12:50,558 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing > plan > for region > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > e956.; > plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462 > 5e160e1b7e956., > src=192.168.1.2,60020,1342017399608, > dest=192.168.1.2,60020,1342002082592 > 2012-07-16 00:12:50,558 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > e956. > to 192.168.1.2,60020,1342002082592 > 2012-07-16 00:12:50,574 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, > server=192.168.1.2,60020,1342017399608, > region=93caf5147d40f5dd4625e160e1b7e956 > 2012-07-16 00:12:50,635 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENING, > server=192.168.1.2,60020,1342017399608, > region=93caf5147d40f5dd4625e160e1b7e956 > 2012-07-16 00:12:50,639 DEBUG +
Ramkrishna.S.Vasudevan 2012-07-17, 14:30
-
Re: Load balancer repeatedly close and open region in the same regionserver.Ted Yu 2012-07-17, 15:04
Howard:
Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out yesterday ? I guess you have noticed the following toward the end of log snippet: 2012-07-16 00:17:50,774 DEBUG org.apache.hadoop.hbase. master.handler.OpenedRegionHandler: Handling OPENED event for trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. from 192.168.1.2,60020,1342017399608; deleting unassigned node As Ram pointed out, there might be two region server processes running on 192.168.1.2 Please verify whether that was the case. Cheers On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < [EMAIL PROTECTED]> wrote: > From the logs I can see that though the server's are same their start code > is different. > Need to analyse the previous logs also. Pls file a JIRA, if possible > attach > the logs to that. > > Thanks Howard. > > Regards > Ram > > > -----Original Message----- > > From: Howard [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, July 17, 2012 7:32 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Load balancer repeatedly close and open region in the same > > regionserver. > > > > *hi,Ted Yu,thanks for your reply:* > > the hbase and hadoop version is > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > > the following is a detail log about the same region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956, > > and it repeats again and again.: > > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: > > balance > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956., > > src=192.168.1.2,60020,1342017399608, > > dest=192.168.1.2,60020,1342002082592 > > 2012-07-16 00:12:49,843 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment > > of > > region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > (offlining) > > 2012-07-16 00:12:49,843 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Creating unassigned node for > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > > 2012-07-16 00:12:49,845 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > > 192.168.1.2,60020,1342017399608 for region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling > > CLOSED > > event for 93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > e1b7e956. > > state=CLOSED, ts=1342368770556, server=192.168.1.2,60020,1342017399608 > > 2012-07-16 00:12:50,555 DEBUG > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned node > > for > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > transition=M_ZK_REGION_OFFLINE, server=10.75.18.34,60000,1342017369575, > > region=93caf5147d40f5dd4625e160e1b7e956 > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing > > plan > > for > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956. > > destination server is 192.168.1.2,60020,1342002082592 > > 2012-07-16 00:12:50,558 DEBUG > > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing > > plan > > for region > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > e956.; > > plan=hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd462 +
Ted Yu 2012-07-17, 15:04
-
Re: Load balancer repeatedly close and open region in the same regionserver.Howard 2012-07-18, 02:11
this problem just only once,Because it happens two day before,I remember I
check the master-status and only always see regions is "pending open" in Regions in Transition,not see there was two regionservers in the same server. "Sent CLOSE to 192.168.0.2,60020,1342017399608",what does "60020,1342017399608" mean?Is there some document can help to read the source code? If still need to upload the log,how to upload the log? sorry I am a freshman with HBase. 2012/7/17 Ted Yu <[EMAIL PROTECTED]> > Howard: > Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out > yesterday ? > I guess you have noticed the following toward the end of log snippet: > > 2012-07-16 00:17:50,774 DEBUG > org.apache.hadoop.hbase. > master.handler.OpenedRegionHandler: Handling OPENED > event for > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > from 192.168.1.2,60020,1342017399608; deleting unassigned node > > As Ram pointed out, there might be two region server processes running on > 192.168.1.2 > > Please verify whether that was the case. > > Cheers > > On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < > [EMAIL PROTECTED]> wrote: > > > From the logs I can see that though the server's are same their start > code > > is different. > > Need to analyse the previous logs also. Pls file a JIRA, if possible > > attach > > the logs to that. > > > > Thanks Howard. > > > > Regards > > Ram > > > > > -----Original Message----- > > > From: Howard [mailto:[EMAIL PROTECTED]] > > > Sent: Tuesday, July 17, 2012 7:32 PM > > > To: [EMAIL PROTECTED] > > > Subject: Re: Load balancer repeatedly close and open region in the same > > > regionserver. > > > > > > *hi,Ted Yu,thanks for your reply:* > > > the hbase and hadoop version is > > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > > > the following is a detail log about the same region > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > > e956, > > > and it repeats again and again.: > > > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: > > > balance > > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > > e1b7e956., > > > src=192.168.1.2,60020,1342017399608, > > > dest=192.168.1.2,60020,1342002082592 > > > 2012-07-16 00:12:49,843 DEBUG > > > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment > > > of > > > region > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > > e956. > > > (offlining) > > > 2012-07-16 00:12:49,843 DEBUG > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > master:60000-0x4384d0a47f40068 Creating unassigned node for > > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state > > > 2012-07-16 00:12:49,845 DEBUG > > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to > > > 192.168.1.2,60020,1342017399608 for region > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > > e956. > > > 2012-07-16 00:12:50,555 DEBUG > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > > transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, > > > region=93caf5147d40f5dd4625e160e1b7e956 > > > 2012-07-16 00:12:50,555 DEBUG > > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling > > > CLOSED > > > event for 93caf5147d40f5dd4625e160e1b7e956 > > > 2012-07-16 00:12:50,555 DEBUG > > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > > > e1b7e956. > > > state=CLOSED, ts=1342368770556, server=192.168.1.2,60020,1342017399608 > > > 2012-07-16 00:12:50,555 DEBUG > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > master:60000-0x4384d0a47f40068 Creating (or updating) unassigned node > > > for > > > 93caf5147d40f5dd4625e160e1b7e956 with OFFLINE state > > > 2012-07-16 00:12:50,558 DEBUG +
Howard 2012-07-18, 02:11
-
Re: Load balancer repeatedly close and open region in the same regionserver.Suraj Varma 2012-07-18, 23:06
You can use pastebin.com or similar services to cut/paste your logs.
--S On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: > this problem just only once,Because it happens two day before,I remember I > check the master-status and only always see regions is "pending open" in > Regions in Transition,not see there was two regionservers in the same > server. > > "Sent CLOSE to 192.168.0.2,60020,1342017399608",what > does "60020,1342017399608" mean?Is there some document can help to read > the source code? > If still need to upload the log,how to upload the log? > sorry I am a freshman with HBase. > > 2012/7/17 Ted Yu <[EMAIL PROTECTED]> > >> Howard: >> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out >> yesterday ? >> I guess you have noticed the following toward the end of log snippet: >> >> 2012-07-16 00:17:50,774 DEBUG >> org.apache.hadoop.hbase. >> master.handler.OpenedRegionHandler: Handling OPENED >> event for >> >> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. >> from 192.168.1.2,60020,1342017399608; deleting unassigned node >> >> As Ram pointed out, there might be two region server processes running on >> 192.168.1.2 >> >> Please verify whether that was the case. >> >> Cheers >> >> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < >> [EMAIL PROTECTED]> wrote: >> >> > From the logs I can see that though the server's are same their start >> code >> > is different. >> > Need to analyse the previous logs also. Pls file a JIRA, if possible >> > attach >> > the logs to that. >> > >> > Thanks Howard. >> > >> > Regards >> > Ram >> > >> > > -----Original Message----- >> > > From: Howard [mailto:[EMAIL PROTECTED]] >> > > Sent: Tuesday, July 17, 2012 7:32 PM >> > > To: [EMAIL PROTECTED] >> > > Subject: Re: Load balancer repeatedly close and open region in the same >> > > regionserver. >> > > >> > > *hi,Ted Yu,thanks for your reply:* >> > > the hbase and hadoop version is >> > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, >> > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 >> > > the following is a detail log about the same region >> > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 >> > > e956, >> > > and it repeats again and again.: >> > > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: >> > > balance >> > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 >> > > e1b7e956., >> > > src=192.168.1.2,60020,1342017399608, >> > > dest=192.168.1.2,60020,1342002082592 >> > > 2012-07-16 00:12:49,843 DEBUG >> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment >> > > of >> > > region >> > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 >> > > e956. >> > > (offlining) >> > > 2012-07-16 00:12:49,843 DEBUG >> > > org.apache.hadoop.hbase.zookeeper.ZKAssign: >> > > master:60000-0x4384d0a47f40068 Creating unassigned node for >> > > 93caf5147d40f5dd4625e160e1b7e956 in a CLOSING state >> > > 2012-07-16 00:12:49,845 DEBUG >> > > org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to >> > > 192.168.1.2,60020,1342017399608 for region >> > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 >> > > e956. >> > > 2012-07-16 00:12:50,555 DEBUG >> > > org.apache.hadoop.hbase.master.AssignmentManager: Handling >> > > transition=RS_ZK_REGION_CLOSED, server=192.168.1.2,60020,1342017399608, >> > > region=93caf5147d40f5dd4625e160e1b7e956 >> > > 2012-07-16 00:12:50,555 DEBUG >> > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling >> > > CLOSED >> > > event for 93caf5147d40f5dd4625e160e1b7e956 >> > > 2012-07-16 00:12:50,555 DEBUG >> > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; >> > > was=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 >> > > e1b7e956. >> > > state=CLOSED, ts=1342368770556, server=192.168.1.2,60020,1342017399608 >> > > 2012-07-16 00:12:50,555 DEBUG +
Suraj Varma 2012-07-18, 23:06
-
Re: Load balancer repeatedly close and open region in the same regionserver.Howard 2012-07-27, 09:43
Thanks Suraj Varma,I have put the log into the pastebin.com.
master log: http://pastebin.com/QWv3K9HQ regionserver log:http://pastebin.com/LM27ui72 Because there is a lot of "region is not online" in the regionserver log,so I have filter this in the regionserver log. The following is the count of "Region is not online:" log,start 23:16,there is a lot of access fail because the region is not online. --------------------------d70285c1a12dec9289ce9290c9349a79 1 23:16 103 23:36 142 23:37 169 23:38 94 23:39 120 23:40 39 23:41 110 23:42 104 23:43 114 23:44 90 23:45 121 23:46 104 23:47 74 23:48 96 23:49 100 23:50 125 23:51 59 23:52 113 23:53 134 23:54 127 23:55 131 23:56 119 23:57 82 23:58 165 23:59 and the region "d70285c1a12dec9289ce9290c9349a79" is move between two regionserver again and again by balancer.Start 23:36,the region is move from regionserver 192.168.18.40 to 192.168.18.40 and fail. 2012/7/19 Suraj Varma <[EMAIL PROTECTED]> > You can use pastebin.com or similar services to cut/paste your logs. > --S > > On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: > > this problem just only once,Because it happens two day before,I remember > I > > check the master-status and only always see regions is "pending open" in > > Regions in Transition,not see there was two regionservers in the same > > server. > > > > "Sent CLOSE to 192.168.0.2,60020,1342017399608",what > > does "60020,1342017399608" mean?Is there some document can help to read > > the source code? > > If still need to upload the log,how to upload the log? > > sorry I am a freshman with HBase. > > > > 2012/7/17 Ted Yu <[EMAIL PROTECTED]> > > > >> Howard: > >> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out > >> yesterday ? > >> I guess you have noticed the following toward the end of log snippet: > >> > >> 2012-07-16 00:17:50,774 DEBUG > >> org.apache.hadoop.hbase. > >> master.handler.OpenedRegionHandler: Handling OPENED > >> event for > >> > >> > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > >> from 192.168.1.2,60020,1342017399608; deleting unassigned node > >> > >> As Ram pointed out, there might be two region server processes running > on > >> 192.168.1.2 > >> > >> Please verify whether that was the case. > >> > >> Cheers > >> > >> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < > >> [EMAIL PROTECTED]> wrote: > >> > >> > From the logs I can see that though the server's are same their start > >> code > >> > is different. > >> > Need to analyse the previous logs also. Pls file a JIRA, if possible > >> > attach > >> > the logs to that. > >> > > >> > Thanks Howard. > >> > > >> > Regards > >> > Ram > >> > > >> > > -----Original Message----- > >> > > From: Howard [mailto:[EMAIL PROTECTED]] > >> > > Sent: Tuesday, July 17, 2012 7:32 PM > >> > > To: [EMAIL PROTECTED] > >> > > Subject: Re: Load balancer repeatedly close and open region in the > same > >> > > regionserver. > >> > > > >> > > *hi,Ted Yu,thanks for your reply:* > >> > > the hbase and hadoop version is > >> > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > >> > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > >> > > the following is a detail log about the same region > >> > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > >> > > e956, > >> > > and it repeats again and again.: > >> > > 2012-07-16 00:12:49,843 INFO org.apache.hadoop.hbase.master.HMaster: > >> > > balance > >> > > > hri=trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160 > >> > > e1b7e956., > >> > > src=192.168.1.2,60020,1342017399608, > >> > > dest=192.168.1.2,60020,1342002082592 > >> > > 2012-07-16 00:12:49,843 DEBUG > >> > > org.apache.hadoop.hbase.master.AssignmentManager: Starting > unassignment > >> > > of > >> > > region > >> > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 +
Howard 2012-07-27, 09:43
-
Re: Load balancer repeatedly close and open region in the same regionserver.Ted Yu 2012-07-27, 13:49
bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40
Have you checked whether there were two region server processes running on 192.168.18.40 ? Cheers On Fri, Jul 27, 2012 at 2:43 AM, Howard <[EMAIL PROTECTED]> wrote: > Thanks Suraj Varma,I have put the log into the pastebin.com. > > master log: http://pastebin.com/QWv3K9HQ > regionserver log:http://pastebin.com/LM27ui72 > > Because there is a lot of "region is not online" in the regionserver log,so > I have filter this in the regionserver log. > The following is the count of "Region is not online:" log,start 23:16,there > is a lot of access fail because the region is not online. > --------------------------d70285c1a12dec9289ce9290c9349a79 > 1 23:16 > 103 23:36 > 142 23:37 > 169 23:38 > 94 23:39 > 120 23:40 > 39 23:41 > 110 23:42 > 104 23:43 > 114 23:44 > 90 23:45 > 121 23:46 > 104 23:47 > 74 23:48 > 96 23:49 > 100 23:50 > 125 23:51 > 59 23:52 > 113 23:53 > 134 23:54 > 127 23:55 > 131 23:56 > 119 23:57 > 82 23:58 > 165 23:59 > > and the region "d70285c1a12dec9289ce9290c9349a79" is move between two > regionserver again and again by balancer.Start 23:36,the region is move > from regionserver 192.168.18.40 to 192.168.18.40 and fail. > > > 2012/7/19 Suraj Varma <[EMAIL PROTECTED]> > > > You can use pastebin.com or similar services to cut/paste your logs. > > --S > > > > On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: > > > this problem just only once,Because it happens two day before,I > remember > > I > > > check the master-status and only always see regions is "pending open" > in > > > Regions in Transition,not see there was two regionservers in the same > > > server. > > > > > > "Sent CLOSE to 192.168.0.2,60020,1342017399608",what > > > does "60020,1342017399608" mean?Is there some document can help to > read > > > the source code? > > > If still need to upload the log,how to upload the log? > > > sorry I am a freshman with HBase. > > > > > > 2012/7/17 Ted Yu <[EMAIL PROTECTED]> > > > > > >> Howard: > > >> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out > > >> yesterday ? > > >> I guess you have noticed the following toward the end of log snippet: > > >> > > >> 2012-07-16 00:17:50,774 DEBUG > > >> org.apache.hadoop.hbase. > > >> master.handler.OpenedRegionHandler: Handling OPENED > > >> event for > > >> > > >> > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > > >> from 192.168.1.2,60020,1342017399608; deleting unassigned node > > >> > > >> As Ram pointed out, there might be two region server processes running > > on > > >> 192.168.1.2 > > >> > > >> Please verify whether that was the case. > > >> > > >> Cheers > > >> > > >> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < > > >> [EMAIL PROTECTED]> wrote: > > >> > > >> > From the logs I can see that though the server's are same their > start > > >> code > > >> > is different. > > >> > Need to analyse the previous logs also. Pls file a JIRA, if > possible > > >> > attach > > >> > the logs to that. > > >> > > > >> > Thanks Howard. > > >> > > > >> > Regards > > >> > Ram > > >> > > > >> > > -----Original Message----- > > >> > > From: Howard [mailto:[EMAIL PROTECTED]] > > >> > > Sent: Tuesday, July 17, 2012 7:32 PM > > >> > > To: [EMAIL PROTECTED] > > >> > > Subject: Re: Load balancer repeatedly close and open region in the > > same > > >> > > regionserver. > > >> > > > > >> > > *hi,Ted Yu,thanks for your reply:* > > >> > > the hbase and hadoop version is > > >> > > HBase Version0.94.0, r1332822Hadoop Version0.20.2-cdh3u1, > > >> > > rbdafb1dbffd0d5f2fbc6ee022e1c8df6500fd638 > > >> > > the following is a detail log about the same region > > >> > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7 > > >> > > e956, > > >> > > and it repeats again and again.: > > >> > > 2012-07-16 00:12:49,843 INFO +
Ted Yu 2012-07-27, 13:49
-
Re: Load balancer repeatedly close and open region in the same regionserver.deanforwever2010 2012-07-30, 05:46
hi Ted,I am in the same team of Howard's
We didn't found two region server processes running on 192.168.18.40 2012/7/27 Ted Yu <[EMAIL PROTECTED]> > bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40 > > Have you checked whether there were two region server processes running on > 192.168.18.40 ? > > Cheers > > On Fri, Jul 27, 2012 at 2:43 AM, Howard <[EMAIL PROTECTED]> wrote: > > > Thanks Suraj Varma,I have put the log into the pastebin.com. > > > > master log: http://pastebin.com/QWv3K9HQ > > regionserver log:http://pastebin.com/LM27ui72 > > > > Because there is a lot of "region is not online" in the regionserver > log,so > > I have filter this in the regionserver log. > > The following is the count of "Region is not online:" log,start > 23:16,there > > is a lot of access fail because the region is not online. > > --------------------------d70285c1a12dec9289ce9290c9349a79 > > 1 23:16 > > 103 23:36 > > 142 23:37 > > 169 23:38 > > 94 23:39 > > 120 23:40 > > 39 23:41 > > 110 23:42 > > 104 23:43 > > 114 23:44 > > 90 23:45 > > 121 23:46 > > 104 23:47 > > 74 23:48 > > 96 23:49 > > 100 23:50 > > 125 23:51 > > 59 23:52 > > 113 23:53 > > 134 23:54 > > 127 23:55 > > 131 23:56 > > 119 23:57 > > 82 23:58 > > 165 23:59 > > > > and the region "d70285c1a12dec9289ce9290c9349a79" is move between two > > regionserver again and again by balancer.Start 23:36,the region is move > > from regionserver 192.168.18.40 to 192.168.18.40 and fail. > > > > > > 2012/7/19 Suraj Varma <[EMAIL PROTECTED]> > > > > > You can use pastebin.com or similar services to cut/paste your logs. > > > --S > > > > > > On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: > > > > this problem just only once,Because it happens two day before,I > > remember > > > I > > > > check the master-status and only always see regions is "pending open" > > in > > > > Regions in Transition,not see there was two regionservers in the > same > > > > server. > > > > > > > > "Sent CLOSE to 192.168.0.2,60020,1342017399608",what > > > > does "60020,1342017399608" mean?Is there some document can help to > > read > > > > the source code? > > > > If still need to upload the log,how to upload the log? > > > > sorry I am a freshman with HBase. > > > > > > > > 2012/7/17 Ted Yu <[EMAIL PROTECTED]> > > > > > > > >> Howard: > > > >> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out > > > >> yesterday ? > > > >> I guess you have noticed the following toward the end of log > snippet: > > > >> > > > >> 2012-07-16 00:17:50,774 DEBUG > > > >> org.apache.hadoop.hbase. > > > >> master.handler.OpenedRegionHandler: Handling OPENED > > > >> event for > > > >> > > > >> > > > > > > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > > > >> from 192.168.1.2,60020,1342017399608; deleting unassigned node > > > >> > > > >> As Ram pointed out, there might be two region server processes > running > > > on > > > >> 192.168.1.2 > > > >> > > > >> Please verify whether that was the case. > > > >> > > > >> Cheers > > > >> > > > >> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < > > > >> [EMAIL PROTECTED]> wrote: > > > >> > > > >> > From the logs I can see that though the server's are same their > > start > > > >> code > > > >> > is different. > > > >> > Need to analyse the previous logs also. Pls file a JIRA, if > > possible > > > >> > attach > > > >> > the logs to that. > > > >> > > > > >> > Thanks Howard. > > > >> > > > > >> > Regards > > > >> > Ram > > > >> > > > > >> > > -----Original Message----- > > > >> > > From: Howard [mailto:[EMAIL PROTECTED]] > > > >> > > Sent: Tuesday, July 17, 2012 7:32 PM > > > >> > > To: [EMAIL PROTECTED] > > > >> > > Subject: Re: Load balancer repeatedly close and open region in > the > > > same > > > >> > > regionserver. > > > >> > > > > > >> > > *hi,Ted Yu,thanks for your reply:* +
deanforwever2010 2012-07-30, 05:46
-
Re: Load balancer repeatedly close and open region in the same regionserver.yuzhihong@... 2012-07-30, 07:50
Can you trace master log to see why there were two region servers on that ip with different start codes ?
Thanks On Jul 29, 2012, at 10:46 PM, deanforwever2010 <[EMAIL PROTECTED]> wrote: > hi Ted,I am in the same team of Howard's > We didn't found two region server processes running on > 192.168.18.40 > > > 2012/7/27 Ted Yu <[EMAIL PROTECTED]> > >> bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40 >> >> Have you checked whether there were two region server processes running on >> 192.168.18.40 ? >> >> Cheers >> >> On Fri, Jul 27, 2012 at 2:43 AM, Howard <[EMAIL PROTECTED]> wrote: >> >>> Thanks Suraj Varma,I have put the log into the pastebin.com. >>> >>> master log: http://pastebin.com/QWv3K9HQ >>> regionserver log:http://pastebin.com/LM27ui72 >>> >>> Because there is a lot of "region is not online" in the regionserver >> log,so >>> I have filter this in the regionserver log. >>> The following is the count of "Region is not online:" log,start >> 23:16,there >>> is a lot of access fail because the region is not online. >>> --------------------------d70285c1a12dec9289ce9290c9349a79 >>> 1 23:16 >>> 103 23:36 >>> 142 23:37 >>> 169 23:38 >>> 94 23:39 >>> 120 23:40 >>> 39 23:41 >>> 110 23:42 >>> 104 23:43 >>> 114 23:44 >>> 90 23:45 >>> 121 23:46 >>> 104 23:47 >>> 74 23:48 >>> 96 23:49 >>> 100 23:50 >>> 125 23:51 >>> 59 23:52 >>> 113 23:53 >>> 134 23:54 >>> 127 23:55 >>> 131 23:56 >>> 119 23:57 >>> 82 23:58 >>> 165 23:59 >>> >>> and the region "d70285c1a12dec9289ce9290c9349a79" is move between two >>> regionserver again and again by balancer.Start 23:36,the region is move >>> from regionserver 192.168.18.40 to 192.168.18.40 and fail. >>> >>> >>> 2012/7/19 Suraj Varma <[EMAIL PROTECTED]> >>> >>>> You can use pastebin.com or similar services to cut/paste your logs. >>>> --S >>>> >>>> On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: >>>>> this problem just only once,Because it happens two day before,I >>> remember >>>> I >>>>> check the master-status and only always see regions is "pending open" >>> in >>>>> Regions in Transition,not see there was two regionservers in the >> same >>>>> server. >>>>> >>>>> "Sent CLOSE to 192.168.0.2,60020,1342017399608",what >>>>> does "60020,1342017399608" mean?Is there some document can help to >>> read >>>>> the source code? >>>>> If still need to upload the log,how to upload the log? >>>>> sorry I am a freshman with HBase. >>>>> >>>>> 2012/7/17 Ted Yu <[EMAIL PROTECTED]> >>>>> >>>>>> Howard: >>>>>> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out >>>>>> yesterday ? >>>>>> I guess you have noticed the following toward the end of log >> snippet: >>>>>> >>>>>> 2012-07-16 00:17:50,774 DEBUG >>>>>> org.apache.hadoop.hbase. >>>>>> master.handler.OpenedRegionHandler: Handling OPENED >>>>>> event for >>>>>> >>>>>> >>>> >>> >> trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. >>>>>> from 192.168.1.2,60020,1342017399608; deleting unassigned node >>>>>> >>>>>> As Ram pointed out, there might be two region server processes >> running >>>> on >>>>>> 192.168.1.2 >>>>>> >>>>>> Please verify whether that was the case. >>>>>> >>>>>> Cheers >>>>>> >>>>>> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < >>>>>> [EMAIL PROTECTED]> wrote: >>>>>> >>>>>>> From the logs I can see that though the server's are same their >>> start >>>>>> code >>>>>>> is different. >>>>>>> Need to analyse the previous logs also. Pls file a JIRA, if >>> possible >>>>>>> attach >>>>>>> the logs to that. >>>>>>> >>>>>>> Thanks Howard. >>>>>>> >>>>>>> Regards >>>>>>> Ram >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Howard [mailto:[EMAIL PROTECTED]] >>>>>>>> Sent: Tuesday, July 17, 2012 7:32 PM >>>>>>>> To: [EMAIL PROTECTED] >>>>>>>> Subject: Re: Load balancer repeatedly close and open region in +
yuzhihong@... 2012-07-30, 07:50
-
Re: Load balancer repeatedly close and open region in the same regionserver.deanforwever2010 2012-07-30, 10:26
by my understanding, even if we started two region server on one machine,
the master will check if the region server with the same ip and port had registered, if registered ,the later one will be ignored. you can see the source code in org.apache.hadoop.hbase.master.ServerManager 2012/7/30 <[EMAIL PROTECTED]> > Can you trace master log to see why there were two region servers on that > ip with different start codes ? > > Thanks > > > > On Jul 29, 2012, at 10:46 PM, deanforwever2010 <[EMAIL PROTECTED]> > wrote: > > > hi Ted,I am in the same team of Howard's > > We didn't found two region server processes running on > > 192.168.18.40 > > > > > > 2012/7/27 Ted Yu <[EMAIL PROTECTED]> > > > >> bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40 > >> > >> Have you checked whether there were two region server processes running > on > >> 192.168.18.40 ? > >> > >> Cheers > >> > >> On Fri, Jul 27, 2012 at 2:43 AM, Howard <[EMAIL PROTECTED]> wrote: > >> > >>> Thanks Suraj Varma,I have put the log into the pastebin.com. > >>> > >>> master log: http://pastebin.com/QWv3K9HQ > >>> regionserver log:http://pastebin.com/LM27ui72 > >>> > >>> Because there is a lot of "region is not online" in the regionserver > >> log,so > >>> I have filter this in the regionserver log. > >>> The following is the count of "Region is not online:" log,start > >> 23:16,there > >>> is a lot of access fail because the region is not online. > >>> --------------------------d70285c1a12dec9289ce9290c9349a79 > >>> 1 23:16 > >>> 103 23:36 > >>> 142 23:37 > >>> 169 23:38 > >>> 94 23:39 > >>> 120 23:40 > >>> 39 23:41 > >>> 110 23:42 > >>> 104 23:43 > >>> 114 23:44 > >>> 90 23:45 > >>> 121 23:46 > >>> 104 23:47 > >>> 74 23:48 > >>> 96 23:49 > >>> 100 23:50 > >>> 125 23:51 > >>> 59 23:52 > >>> 113 23:53 > >>> 134 23:54 > >>> 127 23:55 > >>> 131 23:56 > >>> 119 23:57 > >>> 82 23:58 > >>> 165 23:59 > >>> > >>> and the region "d70285c1a12dec9289ce9290c9349a79" is move between two > >>> regionserver again and again by balancer.Start 23:36,the region is move > >>> from regionserver 192.168.18.40 to 192.168.18.40 and fail. > >>> > >>> > >>> 2012/7/19 Suraj Varma <[EMAIL PROTECTED]> > >>> > >>>> You can use pastebin.com or similar services to cut/paste your logs. > >>>> --S > >>>> > >>>> On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: > >>>>> this problem just only once,Because it happens two day before,I > >>> remember > >>>> I > >>>>> check the master-status and only always see regions is "pending open" > >>> in > >>>>> Regions in Transition,not see there was two regionservers in the > >> same > >>>>> server. > >>>>> > >>>>> "Sent CLOSE to 192.168.0.2,60020,1342017399608",what > >>>>> does "60020,1342017399608" mean?Is there some document can help to > >>> read > >>>>> the source code? > >>>>> If still need to upload the log,how to upload the log? > >>>>> sorry I am a freshman with HBase. > >>>>> > >>>>> 2012/7/17 Ted Yu <[EMAIL PROTECTED]> > >>>>> > >>>>>> Howard: > >>>>>> Before filing JIRA, can you verify with 0.94.1 RC that Lars sent out > >>>>>> yesterday ? > >>>>>> I guess you have noticed the following toward the end of log > >> snippet: > >>>>>> > >>>>>> 2012-07-16 00:17:50,774 DEBUG > >>>>>> org.apache.hadoop.hbase. > >>>>>> master.handler.OpenedRegionHandler: Handling OPENED > >>>>>> event for > >>>>>> > >>>>>> > >>>> > >>> > >> > trackurl_status_list,zO6u4o8,1342291884831.93caf5147d40f5dd4625e160e1b7e956. > >>>>>> from 192.168.1.2,60020,1342017399608; deleting unassigned node > >>>>>> > >>>>>> As Ram pointed out, there might be two region server processes > >> running > >>>> on > >>>>>> 192.168.1.2 > >>>>>> > >>>>>> Please verify whether that was the case. > >>>>>> > >>>>>> Cheers > >>>>>> > >>>>>> On Tue, Jul 17, 2012 at 7:30 AM, Ramkrishna.S.Vasudevan < > >>>>>> [EMAIL PROTECTED]> wrote: > >>>>>> > >>>>>>> From the logs I can see that though the server's are same their +
deanforwever2010 2012-07-30, 10:26
-
Re: Load balancer repeatedly close and open region in the same regionserver.Howard 2012-07-30, 08:27
we checked the regionserver using ps aux,we find only one regionserver
instance. we checked zookeeper,using ls,we find there is only one regionserver info. [zk: localhost:2181(CONNECTED) 3] ls /unimportance-hbase/rs [192.168.18.38,60020,1343569932859, 192.168.18.39,60020,1343569991763, 192.168.18.37,60020,1343569903104] but we checked the master log,we find when we restart the hbase cluster using graceful_stop as the following script: #!/bin/sh HBASE_BACKUP_MASTERS="$HBASE_CONF_DIR/backupmasters" cd $HBASE_HOME echo "stop backup master $HBASE_BACKUP_MASTER ..." ./bin/hbase-daemons.sh --config "${HBASE_CONF_DIR}" --hosts "${HBASE_BACKUP_MASTERS}" stop master echo "restart Master&RegionSevers ..." ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master for i in `cat $HBASE_CONF_DIR/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done echo "start backup master $HBASE_BACKUP_MASTER ..." ./bin/hbase-daemons.sh --config "${HBASE_CONF_DIR}" --hosts "${HBASE_BACKUP_MASTERS}" start master echo "balance_switch true"|hbase shell echo "balance_switch true"|hbase shell we find there was three register info about the same regionserver,and we find all the regionserver are in the same situation. hbase-hadoop-master-h018034.eos.grid.com.cn.log.2012-07-29:2012-07-29 21:51:17,880 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=192.168.18.38,60020,1342695597783 hbase-hadoop-master-h018034.eos.grid.com.cn.log.2012-07-29:2012-07-29 21:52:13,571 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=192.168.18.38,60020,1343569932859 hbase-hadoop-master-h018034.eos.grid.com.cn.log.2012-07-29:2012-07-29 22:17:50,254 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=192.168.18.38,60020,1343569932859 2012/7/30 <[EMAIL PROTECTED]> > Can you trace master log to see why there were two region servers on that > ip with different start codes ? > > Thanks > > > > On Jul 29, 2012, at 10:46 PM, deanforwever2010 <[EMAIL PROTECTED]> > wrote: > > > hi Ted,I am in the same team of Howard's > > We didn't found two region server processes running on > > 192.168.18.40 > > > > > > 2012/7/27 Ted Yu <[EMAIL PROTECTED]> > > > >> bq. the region is move from regionserver 192.168.18.40 to 192.168.18.40 > >> > >> Have you checked whether there were two region server processes running > on > >> 192.168.18.40 ? > >> > >> Cheers > >> > >> On Fri, Jul 27, 2012 at 2:43 AM, Howard <[EMAIL PROTECTED]> wrote: > >> > >>> Thanks Suraj Varma,I have put the log into the pastebin.com. > >>> > >>> master log: http://pastebin.com/QWv3K9HQ > >>> regionserver log:http://pastebin.com/LM27ui72 > >>> > >>> Because there is a lot of "region is not online" in the regionserver > >> log,so > >>> I have filter this in the regionserver log. > >>> The following is the count of "Region is not online:" log,start > >> 23:16,there > >>> is a lot of access fail because the region is not online. > >>> --------------------------d70285c1a12dec9289ce9290c9349a79 > >>> 1 23:16 > >>> 103 23:36 > >>> 142 23:37 > >>> 169 23:38 > >>> 94 23:39 > >>> 120 23:40 > >>> 39 23:41 > >>> 110 23:42 > >>> 104 23:43 > >>> 114 23:44 > >>> 90 23:45 > >>> 121 23:46 > >>> 104 23:47 > >>> 74 23:48 > >>> 96 23:49 > >>> 100 23:50 > >>> 125 23:51 > >>> 59 23:52 > >>> 113 23:53 > >>> 134 23:54 > >>> 127 23:55 > >>> 131 23:56 > >>> 119 23:57 > >>> 82 23:58 > >>> 165 23:59 > >>> > >>> and the region "d70285c1a12dec9289ce9290c9349a79" is move between two > >>> regionserver again and again by balancer.Start 23:36,the region is move > >>> from regionserver 192.168.18.40 to 192.168.18.40 and fail. > >>> > >>> > >>> 2012/7/19 Suraj Varma <[EMAIL PROTECTED]> > >>> > >>>> You can use pastebin.com or similar services to cut/paste your logs. > >>>> --S > >>>> > >>>> On Tue, Jul 17, 2012 at 7:11 PM, Howard <[EMAIL PROTECTED]> wrote: > >>>>> this problem just only once,Because it happens two day before,I +
Howard 2012-07-30, 08:27
|