|
吴限
2011-07-27, 15:58
Chris Tarnas
2011-07-27, 16:16
Buttler, David
2011-07-27, 16:17
Stack
2011-07-27, 16:18
吴限
2011-07-27, 16:23
吴限
2011-07-27, 16:23
Stack
2011-07-27, 16:29
吴限
2011-07-27, 16:33
吴限
2011-07-27, 16:35
Chris Tarnas
2011-07-27, 16:40
吴限
2011-07-27, 16:46
Suraj Varma
2011-07-27, 17:29
Jeff Whiting
2011-07-27, 23:33
Nico Guba
2011-07-28, 05:50
Xian Woo
2011-07-28, 12:52
Stack
2011-07-28, 16:02
|
-
data loss due to regionserver going down吴限 2011-07-27, 15:58
Hi everyone. I'd like to run the following *data* *loss* scenario by you to
see if we are doing something obviously wrong with our setup here. Setup: -cdh3u0 - Hadoop 0.20.2 - HBase 0.90.1 - 1 Master Node running as NameNode & JobTracker -zookeeper quorum - 2 child nodes running as Datanode, TaskTracker and RegionServer each - dfs.replication is set to 1 First, I inserted some data into the hbase a few hours ago. Then after a while. I rebooted one of the region servers and waited until the master responded to that. However, after I checked the table using hbase shell (I used the "count" command), I noticed that there was a huge amount of data being lost. After I restarted the regionserver which I had rebooted and checked again, I found that some of the missing data was got back but there still existed some data which hadn't been found yet. At last,after I disabled the table and then enabled the table , I found that all data was stored in the cluster and there was no data that was lost. This is problematic since we are supposed to replicate at x1, so at least one other node should be able to theoretically serve the *data* that the downed regionserver can't. Questions: - How can you guys explain this weird situation? - Are there way to recover such lost *data*? Any tips here are definitely appreciated. I'll be happy to provide more information as well.-0
-
Re: data loss due to regionserver going downChris Tarnas 2011-07-27, 16:16
Replication of 1x means no replication. 2x would mean the data exists in two locations (what it looks like you want). Running with a replication of 1x is a very bad idea and is pretty much a guaranteed way to get data loss.
-chris On Jul 27, 2011, at 8:58 AM, 吴限 wrote: > Hi everyone. I'd like to run the following *data* *loss* scenario by you to > see if > we are doing something obviously wrong with our setup here. > > Setup: > -cdh3u0 > - Hadoop 0.20.2 > - HBase 0.90.1 > - 1 Master Node running as NameNode & JobTracker > -zookeeper quorum > - 2 child nodes running as Datanode, TaskTracker and RegionServer each > - dfs.replication is set to 1 > > First, I inserted some data into the hbase a few hours ago. > Then after a while. I rebooted one of the region servers and waited until > the master responded to that. However, after I checked the table using hbase > shell (I used the "count" command), I noticed that there was a huge amount > of data being lost. > After I restarted the regionserver which I had rebooted and checked again, > I found that some of the missing data was got back but there still existed > some data which hadn't been found yet. > At last,after I disabled the table and then enabled the table , I found that > all data was stored in the cluster and there was no data that was lost. > > This is problematic since we are supposed to > replicate at x1, so at least one other node should be able to > theoretically serve the *data* that the downed regionserver can't. > > Questions: > > - How can you guys explain this weird situation? > - Are there way to recover such lost *data*? > > Any tips here are definitely appreciated. I'll be happy to provide more > information as well.-0
-
RE: data loss due to regionserver going downButtler, David 2011-07-27, 16:17
When replication is set to 1, that means there is only one copy of the data. If you take a node offline, any data on that node will be unavailable. In your scenario, try upping to a replication factor of 2
Dave -----Original Message----- From: 吴限 [mailto:[EMAIL PROTECTED]] Sent: Wednesday, July 27, 2011 8:58 AM To: [EMAIL PROTECTED] Subject: data loss due to regionserver going down Hi everyone. I'd like to run the following *data* *loss* scenario by you to see if we are doing something obviously wrong with our setup here. Setup: -cdh3u0 - Hadoop 0.20.2 - HBase 0.90.1 - 1 Master Node running as NameNode & JobTracker -zookeeper quorum - 2 child nodes running as Datanode, TaskTracker and RegionServer each - dfs.replication is set to 1 First, I inserted some data into the hbase a few hours ago. Then after a while. I rebooted one of the region servers and waited until the master responded to that. However, after I checked the table using hbase shell (I used the "count" command), I noticed that there was a huge amount of data being lost. After I restarted the regionserver which I had rebooted and checked again, I found that some of the missing data was got back but there still existed some data which hadn't been found yet. At last,after I disabled the table and then enabled the table , I found that all data was stored in the cluster and there was no data that was lost. This is problematic since we are supposed to replicate at x1, so at least one other node should be able to theoretically serve the *data* that the downed regionserver can't. Questions: - How can you guys explain this weird situation? - Are there way to recover such lost *data*? Any tips here are definitely appreciated. I'll be happy to provide more information as well.-0
-
Re: data loss due to regionserver going downStack 2011-07-27, 16:18
On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote:
> Setup: > -cdh3u0 > - Hadoop 0.20.2 You are using the hadoop from cdh3u0? > - dfs.replication is set to 1 > You will lose data if a machine goes away. You have two machines but only one instance of each data block; think of it as half of your data one one node and the rest on another. If you kill one machine, half your data is gone. > After I restarted the regionserver which I had rebooted and checked again, > I found that some of the missing data was got back but there still existed > some data which hadn't been found yet. I wonder what was going on here that we didn't see it all restored. > This is problematic since we are supposed to > replicate at x1, so at least one other node should be able to > theoretically serve the *data* that the downed regionserver can't. > No. The behavior you describe would come with replication of 2, not 1. St.Ack
-
Re: data loss due to regionserver going down吴限 2011-07-27, 16:23
Thx for your reply. But actually later I did another experiment similar to
one which I explained earlier. Step 1: I inserted some data into the hbase. Step 2: I shut one of the region servers. Step 3 : I checked the table and found some data had been lost. Step 4: I disabled the table and then enabled the table Step 5 : I checked again and found nothing lost. If some data didn't exist in the other region server, then how can u explain this? Hope to get ur reply.Thx~ 2011/7/28 Chris Tarnas <[EMAIL PROTECTED]> > Replication of 1x means no replication. 2x would mean the data exists in > two locations (what it looks like you want). Running with a replication of > 1x is a very bad idea and is pretty much a guaranteed way to get data loss. > > -chris > > On Jul 27, 2011, at 8:58 AM, 吴限 wrote: > > > Hi everyone. I'd like to run the following *data* *loss* scenario by you > to > > see if > > we are doing something obviously wrong with our setup here. > > > > Setup: > > -cdh3u0 > > - Hadoop 0.20.2 > > - HBase 0.90.1 > > - 1 Master Node running as NameNode & JobTracker > > -zookeeper quorum > > - 2 child nodes running as Datanode, TaskTracker and RegionServer each > > - dfs.replication is set to 1 > > > > First, I inserted some data into the hbase a few hours ago. > > Then after a while. I rebooted one of the region servers and waited until > > the master responded to that. However, after I checked the table using > hbase > > shell (I used the "count" command), I noticed that there was a huge > amount > > of data being lost. > > After I restarted the regionserver which I had rebooted and checked > again, > > I found that some of the missing data was got back but there still > existed > > some data which hadn't been found yet. > > At last,after I disabled the table and then enabled the table , I found > that > > all data was stored in the cluster and there was no data that was lost. > > > > This is problematic since we are supposed to > > replicate at x1, so at least one other node should be able to > > theoretically serve the *data* that the downed regionserver can't. > > > > Questions: > > > > - How can you guys explain this weird situation? > > - Are there way to recover such lost *data*? > > > > Any tips here are definitely appreciated. I'll be happy to provide more > > information as well.-0 > >
-
Re: data loss due to regionserver going down吴限 2011-07-27, 16:23
yep~Is there anything wrong with that?
2011/7/28 Stack <[EMAIL PROTECTED]> > On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote: > > Setup: > > -cdh3u0 > > - Hadoop 0.20.2 > > You are using the hadoop from cdh3u0? > > > > - dfs.replication is set to 1 > > > > You will lose data if a machine goes away. You have two machines but > only one instance of each data block; think of it as half of your data > one one node and the rest on another. If you kill one machine, half > your data is gone. > > > > After I restarted the regionserver which I had rebooted and checked > again, > > I found that some of the missing data was got back but there still > existed > > some data which hadn't been found yet. > > > I wonder what was going on here that we didn't see it all restored. > > > > This is problematic since we are supposed to > > replicate at x1, so at least one other node should be able to > > theoretically serve the *data* that the downed regionserver can't. > > > > No. The behavior you describe would come with replication of 2, not 1. > > St.Ack >
-
Re: data loss due to regionserver going downStack 2011-07-27, 16:29
This I can not explain. Check blocks directory on the two servers.
Maybe they were all under one datanode only. St.Ack 2011/7/27 吴限 <[EMAIL PROTECTED]>: > Thx for your reply. But actually later I did another experiment similar to > one which I explained earlier. > Step 1: I inserted some data into the hbase. > Step 2: I shut one of the region servers. > Step 3 : I checked the table and found some data had been lost. > Step 4: I disabled the table and then enabled the table > Step 5 : I checked again and found nothing lost. > > If some data didn't exist in the other region server, then how can u explain > this? > > Hope to get ur reply.Thx~ > > 2011/7/28 Chris Tarnas <[EMAIL PROTECTED]> > >> Replication of 1x means no replication. 2x would mean the data exists in >> two locations (what it looks like you want). Running with a replication of >> 1x is a very bad idea and is pretty much a guaranteed way to get data loss. >> >> -chris >> >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote: >> >> > Hi everyone. I'd like to run the following *data* *loss* scenario by you >> to >> > see if >> > we are doing something obviously wrong with our setup here. >> > >> > Setup: >> > -cdh3u0 >> > - Hadoop 0.20.2 >> > - HBase 0.90.1 >> > - 1 Master Node running as NameNode & JobTracker >> > -zookeeper quorum >> > - 2 child nodes running as Datanode, TaskTracker and RegionServer each >> > - dfs.replication is set to 1 >> > >> > First, I inserted some data into the hbase a few hours ago. >> > Then after a while. I rebooted one of the region servers and waited until >> > the master responded to that. However, after I checked the table using >> hbase >> > shell (I used the "count" command), I noticed that there was a huge >> amount >> > of data being lost. >> > After I restarted the regionserver which I had rebooted and checked >> again, >> > I found that some of the missing data was got back but there still >> existed >> > some data which hadn't been found yet. >> > At last,after I disabled the table and then enabled the table , I found >> that >> > all data was stored in the cluster and there was no data that was lost. >> > >> > This is problematic since we are supposed to >> > replicate at x1, so at least one other node should be able to >> > theoretically serve the *data* that the downed regionserver can't. >> > >> > Questions: >> > >> > - How can you guys explain this weird situation? >> > - Are there way to recover such lost *data*? >> > >> > Any tips here are definitely appreciated. I'll be happy to provide more >> > information as well.-0 >> >> >
-
Re: data loss due to regionserver going down吴限 2011-07-27, 16:33
Dear Stack, thx for your reply~
First I don't know if there is something wrong with the cdh3u0. And thx for ur reminding me about the replication property,which I didn't quite understand but now understands. I'll try to correct this mistake. But actually these situations which I have described really happens with the replication being set to 1. And that's why I find these quite weird . I just started trying hbase a month ago and there exist a lot of things i don't quite understand. Hope to get replied. Thanks~ 2011/7/28 Stack <[EMAIL PROTECTED]> > On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote: > > Setup: > > -cdh3u0 > > - Hadoop 0.20.2 > > You are using the hadoop from cdh3u0? > > > > - dfs.replication is set to 1 > > > > You will lose data if a machine goes away. You have two machines but > only one instance of each data block; think of it as half of your data > one one node and the rest on another. If you kill one machine, half > your data is gone. > > > > After I restarted the regionserver which I had rebooted and checked > again, > > I found that some of the missing data was got back but there still > existed > > some data which hadn't been found yet. > > > I wonder what was going on here that we didn't see it all restored. > > > > This is problematic since we are supposed to > > replicate at x1, so at least one other node should be able to > > theoretically serve the *data* that the downed regionserver can't. > > > > No. The behavior you describe would come with replication of 2, not 1. > > St.Ack >
-
Re: data loss due to regionserver going down吴限 2011-07-27, 16:35
Here is my hbase-site.xml:
configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://server3.yun.com:54310/hbase</value> <description>The directory shared by region servers. </description> </property> <property> <name>hbase.zookeeper.quorum</name> <value>server3.yun.com</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> 2011/7/28 Stack <[EMAIL PROTECTED]> > On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote: > > Setup: > > -cdh3u0 > > - Hadoop 0.20.2 > > You are using the hadoop from cdh3u0? > > > > - dfs.replication is set to 1 > > > > You will lose data if a machine goes away. You have two machines but > only one instance of each data block; think of it as half of your data > one one node and the rest on another. If you kill one machine, half > your data is gone. > > > > After I restarted the regionserver which I had rebooted and checked > again, > > I found that some of the missing data was got back but there still > existed > > some data which hadn't been found yet. > > > I wonder what was going on here that we didn't see it all restored. > > > > This is problematic since we are supposed to > > replicate at x1, so at least one other node should be able to > > theoretically serve the *data* that the downed regionserver can't. > > > > No. The behavior you describe would come with replication of 2, not 1. > > St.Ack >
-
Re: data loss due to regionserver going downChris Tarnas 2011-07-27, 16:40
That is strange behavior. How long did you wait between Step 2 and 3, and what is the results of running
hbase hbck at step 3? -chris On Jul 27, 2011, at 9:23 AM, 吴限 wrote: > Thx for your reply. But actually later I did another experiment similar to > one which I explained earlier. > Step 1: I inserted some data into the hbase. > Step 2: I shut one of the region servers. > Step 3 : I checked the table and found some data had been lost. > Step 4: I disabled the table and then enabled the table > Step 5 : I checked again and found nothing lost. > > If some data didn't exist in the other region server, then how can u explain > this? > > Hope to get ur reply.Thx~ > > 2011/7/28 Chris Tarnas <[EMAIL PROTECTED]> > >> Replication of 1x means no replication. 2x would mean the data exists in >> two locations (what it looks like you want). Running with a replication of >> 1x is a very bad idea and is pretty much a guaranteed way to get data loss. >> >> -chris >> >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote: >> >>> Hi everyone. I'd like to run the following *data* *loss* scenario by you >> to >>> see if >>> we are doing something obviously wrong with our setup here. >>> >>> Setup: >>> -cdh3u0 >>> - Hadoop 0.20.2 >>> - HBase 0.90.1 >>> - 1 Master Node running as NameNode & JobTracker >>> -zookeeper quorum >>> - 2 child nodes running as Datanode, TaskTracker and RegionServer each >>> - dfs.replication is set to 1 >>> >>> First, I inserted some data into the hbase a few hours ago. >>> Then after a while. I rebooted one of the region servers and waited until >>> the master responded to that. However, after I checked the table using >> hbase >>> shell (I used the "count" command), I noticed that there was a huge >> amount >>> of data being lost. >>> After I restarted the regionserver which I had rebooted and checked >> again, >>> I found that some of the missing data was got back but there still >> existed >>> some data which hadn't been found yet. >>> At last,after I disabled the table and then enabled the table , I found >> that >>> all data was stored in the cluster and there was no data that was lost. >>> >>> This is problematic since we are supposed to >>> replicate at x1, so at least one other node should be able to >>> theoretically serve the *data* that the downed regionserver can't. >>> >>> Questions: >>> >>> - How can you guys explain this weird situation? >>> - Are there way to recover such lost *data*? >>> >>> Any tips here are definitely appreciated. I'll be happy to provide more >>> information as well.-0 >> >>
-
Re: data loss due to regionserver going down吴限 2011-07-27, 16:46
Just by keep cheking http://master:60010.
Before Step 2 : AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0, regions=10, usedHeap=32, maxHeap=995server5.yun.com:600301311768553647requests=18, regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18, regions=17Then at Step 2, I shut server4 and wait until the html shows like this: AddressStart CodeLoad server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117, maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the following steps.. 在 2011年7月28日 上午12:40,Chris Tarnas <[EMAIL PROTECTED]>写道: > That is strange behavior. How long did you wait between Step 2 and 3, and > what is the results of running > > hbase hbck > > at step 3? > > -chris > > On Jul 27, 2011, at 9:23 AM, 吴限 wrote: > > > Thx for your reply. But actually later I did another experiment similar > to > > one which I explained earlier. > > Step 1: I inserted some data into the hbase. > > Step 2: I shut one of the region servers. > > Step 3 : I checked the table and found some data had been lost. > > Step 4: I disabled the table and then enabled the table > > Step 5 : I checked again and found nothing lost. > > > > If some data didn't exist in the other region server, then how can u > explain > > this? > > > > Hope to get ur reply.Thx~ > > > > 2011/7/28 Chris Tarnas <[EMAIL PROTECTED]> > > > >> Replication of 1x means no replication. 2x would mean the data exists in > >> two locations (what it looks like you want). Running with a replication > of > >> 1x is a very bad idea and is pretty much a guaranteed way to get data > loss. > >> > >> -chris > >> > >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote: > >> > >>> Hi everyone. I'd like to run the following *data* *loss* scenario by > you > >> to > >>> see if > >>> we are doing something obviously wrong with our setup here. > >>> > >>> Setup: > >>> -cdh3u0 > >>> - Hadoop 0.20.2 > >>> - HBase 0.90.1 > >>> - 1 Master Node running as NameNode & JobTracker > >>> -zookeeper quorum > >>> - 2 child nodes running as Datanode, TaskTracker and RegionServer each > >>> - dfs.replication is set to 1 > >>> > >>> First, I inserted some data into the hbase a few hours ago. > >>> Then after a while. I rebooted one of the region servers and waited > until > >>> the master responded to that. However, after I checked the table using > >> hbase > >>> shell (I used the "count" command), I noticed that there was a huge > >> amount > >>> of data being lost. > >>> After I restarted the regionserver which I had rebooted and checked > >> again, > >>> I found that some of the missing data was got back but there still > >> existed > >>> some data which hadn't been found yet. > >>> At last,after I disabled the table and then enabled the table , I found > >> that > >>> all data was stored in the cluster and there was no data that was lost. > >>> > >>> This is problematic since we are supposed to > >>> replicate at x1, so at least one other node should be able to > >>> theoretically serve the *data* that the downed regionserver can't. > >>> > >>> Questions: > >>> > >>> - How can you guys explain this weird situation? > >>> - Are there way to recover such lost *data*? > >>> > >>> Any tips here are definitely appreciated. I'll be happy to provide more > >>> information as well.-0 > >> > >> > >
-
Re: data loss due to regionserver going downSuraj Varma 2011-07-27, 17:29
When you shutdown the region server, check the master logs to see if
master has detected this condition. I've seen weird things happen if dns is not setup correctly - so, check if master (logs & ui) is correctly detecting that the region server is down after step 2. --Suraj 2011/7/27 吴限 <[EMAIL PROTECTED]>: > Just by keep cheking http://master:60010. > Before Step 2 : > AddressStart CodeLoadserver4.yun.com:600301311785159202requests=0, > regions=10, usedHeap=32, > maxHeap=995server5.yun.com:600301311768553647requests=18, > regions=7, usedHeap=117, maxHeap=995Total:servers: 2 requests=18, > regions=17Then > at Step 2, I shut server4 and wait until the html shows like this: > AddressStart CodeLoad > > server5.yun.com:600301311768553647requests=18, regions=17, usedHeap=117, > maxHeap=995Total:servers: 2 requests=18, regions=17then I continued the > following steps.. > > 在 2011年7月28日 上午12:40,Chris Tarnas <[EMAIL PROTECTED]>写道: > >> That is strange behavior. How long did you wait between Step 2 and 3, and >> what is the results of running >> >> hbase hbck >> >> at step 3? >> >> -chris >> >> On Jul 27, 2011, at 9:23 AM, 吴限 wrote: >> >> > Thx for your reply. But actually later I did another experiment similar >> to >> > one which I explained earlier. >> > Step 1: I inserted some data into the hbase. >> > Step 2: I shut one of the region servers. >> > Step 3 : I checked the table and found some data had been lost. >> > Step 4: I disabled the table and then enabled the table >> > Step 5 : I checked again and found nothing lost. >> > >> > If some data didn't exist in the other region server, then how can u >> explain >> > this? >> > >> > Hope to get ur reply.Thx~ >> > >> > 2011/7/28 Chris Tarnas <[EMAIL PROTECTED]> >> > >> >> Replication of 1x means no replication. 2x would mean the data exists in >> >> two locations (what it looks like you want). Running with a replication >> of >> >> 1x is a very bad idea and is pretty much a guaranteed way to get data >> loss. >> >> >> >> -chris >> >> >> >> On Jul 27, 2011, at 8:58 AM, 吴限 wrote: >> >> >> >>> Hi everyone. I'd like to run the following *data* *loss* scenario by >> you >> >> to >> >>> see if >> >>> we are doing something obviously wrong with our setup here. >> >>> >> >>> Setup: >> >>> -cdh3u0 >> >>> - Hadoop 0.20.2 >> >>> - HBase 0.90.1 >> >>> - 1 Master Node running as NameNode & JobTracker >> >>> -zookeeper quorum >> >>> - 2 child nodes running as Datanode, TaskTracker and RegionServer each >> >>> - dfs.replication is set to 1 >> >>> >> >>> First, I inserted some data into the hbase a few hours ago. >> >>> Then after a while. I rebooted one of the region servers and waited >> until >> >>> the master responded to that. However, after I checked the table using >> >> hbase >> >>> shell (I used the "count" command), I noticed that there was a huge >> >> amount >> >>> of data being lost. >> >>> After I restarted the regionserver which I had rebooted and checked >> >> again, >> >>> I found that some of the missing data was got back but there still >> >> existed >> >>> some data which hadn't been found yet. >> >>> At last,after I disabled the table and then enabled the table , I found >> >> that >> >>> all data was stored in the cluster and there was no data that was lost. >> >>> >> >>> This is problematic since we are supposed to >> >>> replicate at x1, so at least one other node should be able to >> >>> theoretically serve the *data* that the downed regionserver can't. >> >>> >> >>> Questions: >> >>> >> >>> - How can you guys explain this weird situation? >> >>> - Are there way to recover such lost *data*? >> >>> >> >>> Any tips here are definitely appreciated. I'll be happy to provide more >> >>> information as well.-0 >> >> >> >> >> >> >
-
Re: data loss due to regionserver going downJeff Whiting 2011-07-27, 23:33
Replication needs to be higher than 1. If you have a node which is running both DataNode and
HRegionServer then shut it down you WILL loose all the data that the DataNode was holding because no one else on the cluster has it. HBase relies on HDFS for the replication of data and does NOT have it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS replication factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase will be able to serve that data for you. You can think of each DataNode as a hard drive. Having a replication factor of 1 means the data is only on one hard drive and if you unplug the hard drive that data will be lost. Having a replication factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array. If you unplug one of the hard drives the data is still on the other ones and nothing is lost. ~Jeff On 7/27/2011 10:35 AM, 锟斤拷锟斤拷 wrote: > Here is my hbase-site.xml: > configuration> > <property> > <name>hbase.cluster.distributed</name> > <value>true</value> > </property> > <property> > <name>hbase.rootdir</name> > <value>hdfs://server3.yun.com:54310/hbase</value> > <description>The directory shared by region servers. > </description> > </property> > <property> > <name>hbase.zookeeper.quorum</name> > <value>server3.yun.com</value> > </property> > <property> > <name>dfs.replication</name> > <value>1</value> > </property> > > > 2011/7/28 Stack <[EMAIL PROTECTED]> > >> On Wed, Jul 27, 2011 at 8:58 AM, 锟斤拷锟斤拷 <[EMAIL PROTECTED]> wrote: >>> Setup: >>> -cdh3u0 >>> - Hadoop 0.20.2 >> You are using the hadoop from cdh3u0? >> >> >>> - dfs.replication is set to 1 >>> >> You will lose data if a machine goes away. You have two machines but >> only one instance of each data block; think of it as half of your data >> one one node and the rest on another. If you kill one machine, half >> your data is gone. >> >> >>> After I restarted the regionserver which I had rebooted and checked >> again, >>> I found that some of the missing data was got back but there still >> existed >>> some data which hadn't been found yet. >> >> I wonder what was going on here that we didn't see it all restored. >> >> >>> This is problematic since we are supposed to >>> replicate at x1, so at least one other node should be able to >>> theoretically serve the *data* that the downed regionserver can't. >>> >> No. The behavior you describe would come with replication of 2, not 1. >> >> St.Ack >> -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED]
-
Re: data loss due to regionserver going downNico Guba 2011-07-28, 05:50
Very interesting. What is a good value where there is not too much of a trade-off in performance?
I'd imagine that setting this too high could create a very 'chatty' cluster. On 28 Jul 2011, at 00:33, Jeff Whiting wrote: > Replication needs to be higher than 1. If you have a node which is running both DataNode and > HRegionServer then shut it down you WILL loose all the data that the DataNode was holding because no > one else on the cluster has it. HBase relies on HDFS for the replication of data and does NOT have > it's own data replication mechanism unlike Cassandra or Voldemort. If you set the HDFS replication > factor to 3 then when you shutdown your node 2 other nodes will have the data and HBase will be able > to serve that data for you. > > You can think of each DataNode as a hard drive. Having a replication factor of 1 means the data is > only on one hard drive and if you unplug the hard drive that data will be lost. Having a replication > factor greater than 1 is like having multiple hard drives in a raid 1 (mirrored) array. If you > unplug one of the hard drives the data is still on the other ones and nothing is lost. > > ~Jeff > > On 7/27/2011 10:35 AM, 吴限 wrote: >> Here is my hbase-site.xml: >> configuration> >> <property> >> <name>hbase.cluster.distributed</name> >> <value>true</value> >> </property> >> <property> >> <name>hbase.rootdir</name> >> <value>hdfs://server3.yun.com:54310/hbase</value> >> <description>The directory shared by region servers. >> </description> >> </property> >> <property> >> <name>hbase.zookeeper.quorum</name> >> <value>server3.yun.com</value> >> </property> >> <property> >> <name>dfs.replication</name> >> <value>1</value> >> </property> >> >> >> 2011/7/28 Stack <[EMAIL PROTECTED]> >> >>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote: >>>> Setup: >>>> -cdh3u0 >>>> - Hadoop 0.20.2 >>> You are using the hadoop from cdh3u0? >>> >>> >>>> - dfs.replication is set to 1 >>>> >>> You will lose data if a machine goes away. You have two machines but >>> only one instance of each data block; think of it as half of your data >>> one one node and the rest on another. If you kill one machine, half >>> your data is gone. >>> >>> >>>> After I restarted the regionserver which I had rebooted and checked >>> again, >>>> I found that some of the missing data was got back but there still >>> existed >>>> some data which hadn't been found yet. >>> >>> I wonder what was going on here that we didn't see it all restored. >>> >>> >>>> This is problematic since we are supposed to >>>> replicate at x1, so at least one other node should be able to >>>> theoretically serve the *data* that the downed regionserver can't. >>>> >>> No. The behavior you describe would come with replication of 2, not 1. >>> >>> St.Ack >>> > > -- > Jeff Whiting > Qualtrics Senior Software Engineer > [EMAIL PROTECTED] >
-
Re: data loss due to regionserver going downXian Woo 2011-07-28, 12:52
Thanks, everybody. I really appreciate what you guys have done with my
question. Indeed , for me the situation which I came across is too complicated and too strange to me .So I've decided to re-install the hbase tool and change the related configuration files.Hope this time it will get better. Thanks again! Best wishes~ Woo. 在 2011年7月28日 下午1:50,Nico Guba <[EMAIL PROTECTED]>写道: > Very interesting. What is a good value where there is not too much of a > trade-off in performance? > > I'd imagine that setting this too high could create a very 'chatty' > cluster. > > On 28 Jul 2011, at 00:33, Jeff Whiting wrote: > > > Replication needs to be higher than 1. If you have a node which is > running both DataNode and > > HRegionServer then shut it down you WILL loose all the data that the > DataNode was holding because no > > one else on the cluster has it. HBase relies on HDFS for the replication > of data and does NOT have > > it's own data replication mechanism unlike Cassandra or Voldemort. If you > set the HDFS replication > > factor to 3 then when you shutdown your node 2 other nodes will have the > data and HBase will be able > > to serve that data for you. > > > > You can think of each DataNode as a hard drive. Having a replication > factor of 1 means the data is > > only on one hard drive and if you unplug the hard drive that data will be > lost. Having a replication > > factor greater than 1 is like having multiple hard drives in a raid 1 > (mirrored) array. If you > > unplug one of the hard drives the data is still on the other ones and > nothing is lost. > > > > ~Jeff > > > > On 7/27/2011 10:35 AM, 吴限 wrote: > >> Here is my hbase-site.xml: > >> configuration> > >> <property> > >> <name>hbase.cluster.distributed</name> > >> <value>true</value> > >> </property> > >> <property> > >> <name>hbase.rootdir</name> > >> <value>hdfs://server3.yun.com:54310/hbase</value> > >> <description>The directory shared by region servers. > >> </description> > >> </property> > >> <property> > >> <name>hbase.zookeeper.quorum</name> > >> <value>server3.yun.com</value> > >> </property> > >> <property> > >> <name>dfs.replication</name> > >> <value>1</value> > >> </property> > >> > >> > >> 2011/7/28 Stack <[EMAIL PROTECTED]> > >> > >>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote: > >>>> Setup: > >>>> -cdh3u0 > >>>> - Hadoop 0.20.2 > >>> You are using the hadoop from cdh3u0? > >>> > >>> > >>>> - dfs.replication is set to 1 > >>>> > >>> You will lose data if a machine goes away. You have two machines but > >>> only one instance of each data block; think of it as half of your data > >>> one one node and the rest on another. If you kill one machine, half > >>> your data is gone. > >>> > >>> > >>>> After I restarted the regionserver which I had rebooted and checked > >>> again, > >>>> I found that some of the missing data was got back but there still > >>> existed > >>>> some data which hadn't been found yet. > >>> > >>> I wonder what was going on here that we didn't see it all restored. > >>> > >>> > >>>> This is problematic since we are supposed to > >>>> replicate at x1, so at least one other node should be able to > >>>> theoretically serve the *data* that the downed regionserver can't. > >>>> > >>> No. The behavior you describe would come with replication of 2, not 1. > >>> > >>> St.Ack > >>> > > > > -- > > Jeff Whiting > > Qualtrics Senior Software Engineer > > [EMAIL PROTECTED] > > > >
-
Re: data loss due to regionserver going downStack 2011-07-28, 16:02
Running with 1 replica is unusual -- and there is little motiviation
for running with this configuration since it means dataloss -- so few have experience with it. St.Ack 2011/7/28 Xian Woo <[EMAIL PROTECTED]>: > Thanks, everybody. I really appreciate what you guys have done with my > question. Indeed , for me the situation which I came across is too > complicated and too strange to me .So I've decided to re-install the hbase > tool and change the related configuration files.Hope this time it will get > better. Thanks again! > Best wishes~ > Woo. > > 在 2011年7月28日 下午1:50,Nico Guba <[EMAIL PROTECTED]>写道: > >> Very interesting. What is a good value where there is not too much of a >> trade-off in performance? >> >> I'd imagine that setting this too high could create a very 'chatty' >> cluster. >> >> On 28 Jul 2011, at 00:33, Jeff Whiting wrote: >> >> > Replication needs to be higher than 1. If you have a node which is >> running both DataNode and >> > HRegionServer then shut it down you WILL loose all the data that the >> DataNode was holding because no >> > one else on the cluster has it. HBase relies on HDFS for the replication >> of data and does NOT have >> > it's own data replication mechanism unlike Cassandra or Voldemort. If you >> set the HDFS replication >> > factor to 3 then when you shutdown your node 2 other nodes will have the >> data and HBase will be able >> > to serve that data for you. >> > >> > You can think of each DataNode as a hard drive. Having a replication >> factor of 1 means the data is >> > only on one hard drive and if you unplug the hard drive that data will be >> lost. Having a replication >> > factor greater than 1 is like having multiple hard drives in a raid 1 >> (mirrored) array. If you >> > unplug one of the hard drives the data is still on the other ones and >> nothing is lost. >> > >> > ~Jeff >> > >> > On 7/27/2011 10:35 AM, 吴限 wrote: >> >> Here is my hbase-site.xml: >> >> configuration> >> >> <property> >> >> <name>hbase.cluster.distributed</name> >> >> <value>true</value> >> >> </property> >> >> <property> >> >> <name>hbase.rootdir</name> >> >> <value>hdfs://server3.yun.com:54310/hbase</value> >> >> <description>The directory shared by region servers. >> >> </description> >> >> </property> >> >> <property> >> >> <name>hbase.zookeeper.quorum</name> >> >> <value>server3.yun.com</value> >> >> </property> >> >> <property> >> >> <name>dfs.replication</name> >> >> <value>1</value> >> >> </property> >> >> >> >> >> >> 2011/7/28 Stack <[EMAIL PROTECTED]> >> >> >> >>> On Wed, Jul 27, 2011 at 8:58 AM, 吴限 <[EMAIL PROTECTED]> wrote: >> >>>> Setup: >> >>>> -cdh3u0 >> >>>> - Hadoop 0.20.2 >> >>> You are using the hadoop from cdh3u0? >> >>> >> >>> >> >>>> - dfs.replication is set to 1 >> >>>> >> >>> You will lose data if a machine goes away. You have two machines but >> >>> only one instance of each data block; think of it as half of your data >> >>> one one node and the rest on another. If you kill one machine, half >> >>> your data is gone. >> >>> >> >>> >> >>>> After I restarted the regionserver which I had rebooted and checked >> >>> again, >> >>>> I found that some of the missing data was got back but there still >> >>> existed >> >>>> some data which hadn't been found yet. >> >>> >> >>> I wonder what was going on here that we didn't see it all restored. >> >>> >> >>> >> >>>> This is problematic since we are supposed to >> >>>> replicate at x1, so at least one other node should be able to >> >>>> theoretically serve the *data* that the downed regionserver can't. >> >>>> >> >>> No. The behavior you describe would come with replication of 2, not 1. >> >>> >> >>> St.Ack >> >>> >> > >> > -- >> > Jeff Whiting >> > Qualtrics Senior Software Engineer >> > [EMAIL PROTECTED] >> > >> >> > |