|
|
-
regionserver restartup error
hua beatls 2012-12-25, 12:14
Hi, we want to test if the regionserver cound be restart by: first step: $'kill -9 xxx(process number; 2nd step: $ ./hbase-daemon.sh start regionserver
we stop the regionserver with 'kill -9 xxx(process number)‘, and want to restart regionserver with ' ./hbase-daemon.sh start regionserver'. this way cannot work. i find regionserver's error below:
012-12-25 19:47:42,555 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Attempting connect to Master server at hadoop1,60000,1355887294437
2012-12-25 19:47:42,599 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at hadoop2/192.168.250.107:60020
2012-12-25 19:47:42,599 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at hadoop1,60000,1355887294437 that we are up with port=60020, startcode=1356436062169
2012-12-25 19:47:42,605 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server hadoop2,60020,1356436062169 has been rejected; Reported time is too far out of sync with master. Time difference of 64785ms > max allowed of 30000ms can we user 'kill -9 xxx(process number) ? or should we use ' $ ./bin/hbase-daemon.sh stop regionserver';
this regionserver is a loaded one.
how to restart this regionserver? many thanks!
beatls
+
hua beatls 2012-12-25, 12:14
-
Re: regionserver restartup error
Nicolas Liochon 2012-12-25, 14:40
Hi,
First, check the date/time on both server and check they don't differ; that's what the error says. You can configure the max allowed with "hbase.master.maxclockskew", but it's unlikely to be a good idea: it's always safer, in any distributed system, to have the servers sharing the same time. ntpd is often used for this.
Second, it's better to use the stop command than doing a kill, especially a kill -9. Doing a stop allows to close nicely the regions this server is handling, and to unregister this region server in the master. With a kill -9, it means that the master will have to detect that this regionserver is dead. By default, that's 3 minutes (zookeeper timeout). In the meantime, the regions on this server won't be available.
Lastly, there is a restart command in he hbase-daemon script: it does the stop & the start..
Cheers,
Nicolas
On Tue, Dec 25, 2012 at 1:14 PM, hua beatls <[EMAIL PROTECTED]> wrote:
> > we stop the regionserver with 'kill -9 xxx(process number)‘, and want to > restart regionserver with ' ./hbase-daemon.sh start
+
Nicolas Liochon 2012-12-25, 14:40
-
Re: regionserver restartup error
hua beatls 2012-12-26, 16:59
HI, yes, data/time is different, we correct the NTP configuraiton ,and problem solved. Thanks! beatls
On Tue, Dec 25, 2012 at 10:40 PM, Nicolas Liochon <[EMAIL PROTECTED]> wrote:
> Hi, > > First, check the date/time on both server and check they don't differ; > that's what the error says. > You can configure the max allowed with "hbase.master.maxclockskew", but > it's unlikely to be a good idea: it's always safer, in any distributed > system, to have the servers sharing the same time. ntpd is often used for > this. > > Second, it's better to use the stop command than doing a kill, especially a > kill -9. Doing a stop allows to close nicely the regions this server is > handling, and to unregister this region server in the master. With a kill > -9, it means that the master will have to detect that this regionserver is > dead. By default, that's 3 minutes (zookeeper timeout). In the meantime, > the regions on this server won't be available. > > Lastly, there is a restart command in he hbase-daemon script: it does the > stop & the start.. > > Cheers, > > Nicolas > > On Tue, Dec 25, 2012 at 1:14 PM, hua beatls <[EMAIL PROTECTED]> wrote: > > > > > we stop the regionserver with 'kill -9 xxx(process number)‘, and want > to > > restart regionserver with ' ./hbase-daemon.sh start >
+
hua beatls 2012-12-26, 16:59
|
|