|
|
-
Why not a restart region server serve the WAL logs the last RS Write?
liushaohui 2013-01-16, 11:26
Dear HBase Devs,
When I restart the hbase cluster, all region servers ' WAL logs will be splitted despite of all the region servers start immediately.
From the master code, I found that the hbase master label each region server with ip,port,start-time
and from the view of master , hbase master think the new region server with same ip and port is different from the old region server and put
the old region server's logs to the split queue. When the cluster have about 500 regions, it usually takes 2 or 4 minutes to make all regions online. Why not make the restart region server serve the old WAL logs to prevent log splits to reduce recovery time?
There is the graceful rs-stop script, which make the region server flush the memstores, close the regions and detete WAL logs before stop.
But how to reduce recover time and prevent unnecessarily log splits when the power of rack or a datacenter is down?
Here are logs:
2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager:Registering server=sd-ml-hadoop23.bj,11600,1358320047485 2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sd-ml-hadoop26.bj,11600,1358320078576 2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sd-ml-hadoop25.bj,11600,1358320068311 2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sd-ml-hadoop24.bj,11600,1358320057835 2013-01-16 15:08:32,845 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2013-01-16 15:08:32,891 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 4, slept for 351 ms, expecting minimum of 1, maximum of 2147483647, timeout of 10000 ms, interval of 1500 ms. 2013-01-16 15:08:34,395 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 4, slept for 1854 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-01-16 15:08:34,398 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358152154355 doesn't belong to a known region server, splitting 2013-01-16 15:08:34,398 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358320047485 belongs to an existing region server
-Shaohui Liu
+
liushaohui 2013-01-16, 11:26
-
RE: Why not a restart region server serve the WAL logs the last RS Write?
Anoop Sam John 2013-01-16, 12:04
Hi >prevent unnecessarily log splits when the power of rack or a datacenter is down?
At this scenario why you think that a log replay is not needed? Becuase some data might have been there only in memstore. How we get back that data into HBase. HBase considers data in HFiles and memstore only as table data. If we dont replay the log the data may be only there in WAL so how a reader will get those data. Am I reading your question correctly liushaohui ?
-Anoop- ________________________________________ From: liushaohui [[EMAIL PROTECTED]] Sent: Wednesday, January 16, 2013 4:56 PM To: [EMAIL PROTECTED] Cc: 谢良; 冯宏华 Subject: Why not a restart region server serve the WAL logs the last RS Write?
Dear HBase Devs,
When I restart the hbase cluster, all region servers ' WAL logs will be splitted despite of all the region servers start immediately.
From the master code, I found that the hbase master label each region server with ip,port,start-time
and from the view of master , hbase master think the new region server with same ip and port is different from the old region server and put
the old region server's logs to the split queue. When the cluster have about 500 regions, it usually takes 2 or 4 minutes to make all regions online. Why not make the restart region server serve the old WAL logs to prevent log splits to reduce recovery time?
There is the graceful rs-stop script, which make the region server flush the memstores, close the regions and detete WAL logs before stop.
But how to reduce recover time and prevent unnecessarily log splits when the power of rack or a datacenter is down?
Here are logs:
2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager:Registering server=sd-ml-hadoop23.bj,11600,1358320047485 2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sd-ml-hadoop26.bj,11600,1358320078576 2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sd-ml-hadoop25.bj,11600,1358320068311 2013-01-16 15:08:32,842 INFO org.apache.hadoop.hbase.master.ServerManager: Registering server=sd-ml-hadoop24.bj,11600,1358320057835 2013-01-16 15:08:32,845 WARN org.apache.hadoop.conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 2013-01-16 15:08:32,891 INFO org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers count to settle; currently checked in 4, slept for 351 ms, expecting minimum of 1, maximum of 2147483647, timeout of 10000 ms, interval of 1500 ms. 2013-01-16 15:08:34,395 INFO org.apache.hadoop.hbase.master.ServerManager: Finished waiting for region servers count to settle; checked in 4, slept for 1854 ms, expecting minimum of 1, maximum of 2147483647, master is running. 2013-01-16 15:08:34,398 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358152154355 doesn't belong to a known region server, splitting 2013-01-16 15:08:34,398 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358320047485 belongs to an existing region server
-Shaohui Liu
+
Anoop Sam John 2013-01-16, 12:04
-
Re: Why not a restart region server serve the WAL logs the last RS Write?
liushaohui 2013-01-16, 12:29
Maybe my description is clear. To restore the data in memstores, the log relay is necessary.
But if all the region server starts and each region server relay the WAL logs from the same ip and port , the cluster may recovery to the state before stop.
And it will reduce one time read/write of WAL logs in log splits.
If one or more region server do not start, we may go the normal split and relay log progress.
On 01/16/2013 08:04 PM, Anoop Sam John wrote: > Hi >> prevent unnecessarily log splits when > the power of rack or a datacenter is down? > > At this scenario why you think that a log replay is not needed? Becuase some data might have been there only in memstore. How we get back that data into HBase. HBase considers data in HFiles and memstore only as table data. If we dont replay the log the data may be only there in WAL so how a reader will get those data. Am I reading your question correctly liushaohui ? > > -Anoop- > ________________________________________ > From: liushaohui [[EMAIL PROTECTED]] > Sent: Wednesday, January 16, 2013 4:56 PM > To: [EMAIL PROTECTED] > Cc: 谢锟斤拷; 锟斤拷昊�> Subject: Why not a restart region server serve the WAL logs the last RS Write? > > Dear HBase Devs, > > When I restart the hbase cluster, all region servers ' WAL logs will be > splitted despite of all the region servers start immediately. > > From the master code, I found that the hbase master label each region > server with ip,port,start-time > > and from the view of master , hbase master think the new region server > with same ip and port is different from the old region server and put > > the old region server's logs to the split queue. When the cluster have > about 500 regions, it usually takes 2 or 4 minutes to make all regions > online. > > > Why not make the restart region server serve the old WAL logs to prevent > log splits to reduce recovery time? > > There is the graceful rs-stop script, which make the region server > flush the memstores, close the regions and detete WAL logs before stop. > > But how to reduce recover time and prevent unnecessarily log splits when > the power of rack or a datacenter is down? > > Here are logs: > > 2013-01-16 15:08:32,842 INFO > org.apache.hadoop.hbase.master.ServerManager:Registering > server=sd-ml-hadoop23.bj,11600,1358320047485 > 2013-01-16 15:08:32,842 INFO > org.apache.hadoop.hbase.master.ServerManager: Registering > server=sd-ml-hadoop26.bj,11600,1358320078576 > 2013-01-16 15:08:32,842 INFO > org.apache.hadoop.hbase.master.ServerManager: Registering > server=sd-ml-hadoop25.bj,11600,1358320068311 > 2013-01-16 15:08:32,842 INFO > org.apache.hadoop.hbase.master.ServerManager: Registering > server=sd-ml-hadoop24.bj,11600,1358320057835 > 2013-01-16 15:08:32,845 WARN org.apache.hadoop.conf.Configuration: > fs.default.name is deprecated. Instead, use fs.defaultFS > 2013-01-16 15:08:32,891 INFO > org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers > count to settle; currently checked in 4, slept for 351 ms, expecting > minimum of 1, maximum of 2147483647, timeout of 10000 ms, interval of > 1500 ms. > 2013-01-16 15:08:34,395 INFO > org.apache.hadoop.hbase.master.ServerManager: Finished waiting for > region servers count to settle; checked in 4, slept for 1854 ms, > expecting minimum of 1, maximum of 2147483647, master is running. > 2013-01-16 15:08:34,398 INFO > org.apache.hadoop.hbase.master.MasterFileSystem: Log folder > hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358152154355 > doesn't belong to a known region server, splitting > 2013-01-16 15:08:34,398 INFO > org.apache.hadoop.hbase.master.MasterFileSystem: Log folder > hdfs://hdfs/hbase/sdtst-miliao/.logs/sd-ml-hadoop23.bj,11600,1358320047485 > belongs to an existing region server > > -Shaohui Liu
+
liushaohui 2013-01-16, 12:29
-
Re: Why not a restart region server serve the WAL logs the last RS Write?
liushaohui 2013-01-16, 12:32
Maybe my description is not clear. To restore the data in memstores, the log relay is necessary.
But if all the region server starts and each region server relay the WAL logs from the same ip and port , the cluster may recovery to the state before stop.
And it will reduce one time read/write of WAL logs in log splits.
If one or more region server do not start, we may go the normal split and relay log progress.
On 01/16/2013 08:29 PM, liushaohui wrote: > Maybe my description is clear. To restore the data in memstores, the log > relay is necessary. > > But if all the region server starts and each region server relay the WAL > logs from the same ip and port , the cluster may recovery to the state > before stop. > > And it will reduce one time read/write of WAL logs in log splits. > > If one or more region server do not start, we may go the normal split > and relay log progress. > > On 01/16/2013 08:04 PM, Anoop Sam John wrote: >> Hi >>> prevent unnecessarily log splits when >> the power of rack or a datacenter is down? >> >> At this scenario why you think that a log replay is not needed? Becuase some data might have been there only in memstore. How we get back that data into HBase. HBase considers data in HFiles and memstore only as table data. If we dont replay the log the data may be only there in WAL so how a reader will get those data. Am I reading your question correctly liushaohui ? >> >> -Anoop- >> ________________________________________ >> From: liushaohui [[EMAIL PROTECTED]] >> Sent: Wednesday, January 16, 2013 4:56 PM >> To: [EMAIL PROTECTED] >> Cc: 谢锟斤拷; 锟斤拷昊�>> Subject: Why not a restart region server serve the WAL logs the last RS Write? >> >> Dear HBase Devs, >> >> When I restart the hbase cluster, all region servers ' WAL logs will be >> splitted despite of all the region servers start immediately. >> >> From the master code, I found that the hbase master label each region >> server with ip,port,start-time >> >> and from the view of master , hbase master think the new region server >> with same ip and port is different from the old region server and put >> >> the old region server's logs to the split queue. When the cluster have >> about 500 regions, it usually takes 2 or 4 minutes to make all regions >> online. >> >> >> Why not make the restart region server serve the old WAL logs to prevent >> log splits to reduce recovery time? >> >> There is the graceful rs-stop script, which make the region server >> flush the memstores, close the regions and detete WAL logs before stop. >> >> But how to reduce recover time and prevent unnecessarily log splits when >> the power of rack or a datacenter is down? >> >> Here are logs: >> >> 2013-01-16 15:08:32,842 INFO >> org.apache.hadoop.hbase.master.ServerManager:Registering >> server=sd-ml-hadoop23.bj,11600,1358320047485 >> 2013-01-16 15:08:32,842 INFO >> org.apache.hadoop.hbase.master.ServerManager: Registering >> server=sd-ml-hadoop26.bj,11600,1358320078576 >> 2013-01-16 15:08:32,842 INFO >> org.apache.hadoop.hbase.master.ServerManager: Registering >> server=sd-ml-hadoop25.bj,11600,1358320068311 >> 2013-01-16 15:08:32,842 INFO >> org.apache.hadoop.hbase.master.ServerManager: Registering >> server=sd-ml-hadoop24.bj,11600,1358320057835 >> 2013-01-16 15:08:32,845 WARN org.apache.hadoop.conf.Configuration: >> fs.default.name is deprecated. Instead, use fs.defaultFS >> 2013-01-16 15:08:32,891 INFO >> org.apache.hadoop.hbase.master.ServerManager: Waiting for region servers >> count to settle; currently checked in 4, slept for 351 ms, expecting >> minimum of 1, maximum of 2147483647, timeout of 10000 ms, interval of >> 1500 ms. >> 2013-01-16 15:08:34,395 INFO >> org.apache.hadoop.hbase.master.ServerManager: Finished waiting for >> region servers count to settle; checked in 4, slept for 1854 ms, >> expecting minimum of 1, maximum of 2147483647, master is running. >> 2013-01-16 15:08:34,398 INFO >> org.apache.hadoop.hbase.master.MasterFileSystem: Log folder >> hdfs
+
liushaohui 2013-01-16, 12:32
|
|