|
|
-
Zookeeper/Hbase storage type on EC2
Yves Langisch 2011-07-19, 17:37
Hi,
I plan to setup a HBase installation on EC2. As recommended I therefore want to setup a zookeeper ensemble with 3 nodes but I'm not sure what kind of storage I've to choose for the two zk directories (dataDir and dataLogDir). Do this two directories need to be on a persistent storage which survives a node crash? Or does an ephemeral storage device suffice since a failed node which is restarted is being synchronized with the other two nodes anyway? And what happens when I restart the whole zk ensemble with ephemeral storage which means there is no zk data available anymore after booting up? Any impact on the Hbase cluster?
I've read through the documentation but I was not able to answer those questions.
Thanks Yves
+
Yves Langisch 2011-07-19, 17:37
-
Re: Zookeeper/Hbase storage type on EC2
Yves Langisch 2011-07-21, 08:21
I just need a statement if it makes sense to use ephemeral storage for ZK at all (in conjunction with Hbase if the answer depends on the use case)?
Any help is appreciated.
Thanks in advance Yves
On 19.07.2011 19:37, Yves Langisch wrote: > Hi, > > I plan to setup a HBase installation on EC2. As recommended I therefore want to setup a zookeeper ensemble with 3 nodes but I'm not sure what kind of storage I've to choose for the two zk directories (dataDir and dataLogDir). Do this two directories need to be on a persistent storage which survives a node crash? Or does an ephemeral storage device suffice since a failed node which is restarted is being synchronized with the other two nodes anyway? And what happens when I restart the whole zk ensemble with ephemeral storage which means there is no zk data available anymore after booting up? Any impact on the Hbase cluster? > > I've read through the documentation but I was not able to answer those questions. > > Thanks > Yves
+
Yves Langisch 2011-07-21, 08:21
-
Re: Zookeeper/Hbase storage type on EC2
Patrick Hunt 2011-07-21, 22:56
On Thu, Jul 21, 2011 at 1:21 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: > I just need a statement if it makes sense to use ephemeral storage for ZK at > all (in conjunction with Hbase if the answer depends on the use case)? > > Any help is appreciated. > > > On 19.07.2011 19:37, Yves Langisch wrote:
>> I plan to setup a HBase installation on EC2. As recommended I therefore >> want to setup a zookeeper ensemble with 3 nodes but I'm not sure what kind >> of storage I've to choose for the two zk directories (dataDir and >> dataLogDir). Do this two directories need to be on a persistent storage >> which survives a node crash? Or does an ephemeral storage device suffice >> since a failed node which is restarted is being synchronized with the other >> two nodes anyway? And what happens when I restart the whole zk ensemble with >> ephemeral storage which means there is no zk data available anymore after >> booting up? Any impact on the Hbase cluster?
I don't think you want to use ephemeral storage given that HBase would lose information if the zk cluster was restarted. But really that's a better question for the hbase team, I don't know exactly how they are using ZK and the effects of such a loss on their application.
Regards,
Patrick
+
Patrick Hunt 2011-07-21, 22:56
-
RE: Zookeeper/Hbase storage type on EC2
Laxman 2011-07-22, 11:17
Hi Pat,
Actually, HBase uses both ephemeral and persistent nodes. Ephemeral znodes are used for coordination purpose. Persistent znodes are used for storing metadata. So, there is no harm on ZK cluster restart as well. >> Do this two directories need to be on a persistent storage >> which survives a node crash? Or does an ephemeral storage device suffice >> since a failed node which is restarted is being synchronized with the other >> two nodes anyway?
Yves, what exactly you mean by ephemeral storage and persistent storage here?
ZK supports these two types of nodes and both types of nodes are used for different purpose as mentioned above in my explanation. Its up to the application to decide. Both the types of znodes, will be persisted to the local disk.
Hope this clarifies some of your questions.
-- Thanks, Laxman -----Original Message----- From: Patrick Hunt [mailto:[EMAIL PROTECTED]] Sent: Friday, July 22, 2011 4:27 AM To: [EMAIL PROTECTED] Subject: Re: Zookeeper/Hbase storage type on EC2
On Thu, Jul 21, 2011 at 1:21 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: > I just need a statement if it makes sense to use ephemeral storage for ZK at > all (in conjunction with Hbase if the answer depends on the use case)? > > Any help is appreciated. > > > On 19.07.2011 19:37, Yves Langisch wrote:
>> I plan to setup a HBase installation on EC2. As recommended I therefore >> want to setup a zookeeper ensemble with 3 nodes but I'm not sure what kind >> of storage I've to choose for the two zk directories (dataDir and >> dataLogDir). Do this two directories need to be on a persistent storage >> which survives a node crash? Or does an ephemeral storage device suffice >> since a failed node which is restarted is being synchronized with the other >> two nodes anyway? And what happens when I restart the whole zk ensemble with >> ephemeral storage which means there is no zk data available anymore after >> booting up? Any impact on the Hbase cluster?
I don't think you want to use ephemeral storage given that HBase would lose information if the zk cluster was restarted. But really that's a better question for the hbase team, I don't know exactly how they are using ZK and the effects of such a loss on their application.
Regards,
Patrick
+
Laxman 2011-07-22, 11:17
-
Re: Zookeeper/Hbase storage type on EC2
Ted Dunning 2011-07-22, 17:50
I think that the OP was asking about ephemeral storage in the since of instance storage on the EC2 VM which disappears when the VM disappears. This is in contrast to the more persistent EBS volumes that survive the VM.
Using ephemeral storage is fine for most ZK installations on EC2. Using one EBS volume provides a degree of disaster resilience but using a majority of EBS might have been less reliable during the EBS outage a few months ago.
On Fri, Jul 22, 2011 at 4:17 AM, Laxman <[EMAIL PROTECTED]> wrote:
> >> Do this two directories need to be on a persistent storage > >> which survives a node crash? Or does an ephemeral storage device suffice > >> since a failed node which is restarted is being synchronized with the > other > >> two nodes anyway? > > Yves, what exactly you mean by ephemeral storage and persistent storage > here? >
+
Ted Dunning 2011-07-22, 17:50
-
Re: Zookeeper/Hbase storage type on EC2
Andrew Purtell 2011-07-23, 05:42
HBase does not keep any persistent state in ZooKeeper.
You can restart your ZK cluster one peer at a time without affecting HBase.
If you are going to bring your entire ZK cluster down, first shut down HBase. Then once ZK is started again, bring up HBase. - Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
----- Original Message ----- > From: Patrick Hunt <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Thursday, July 21, 2011 3:56 PM > Subject: Re: Zookeeper/Hbase storage type on EC2 > > On Thu, Jul 21, 2011 at 1:21 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: >> I just need a statement if it makes sense to use ephemeral storage for ZK > at >> all (in conjunction with Hbase if the answer depends on the use case)? >> >> Any help is appreciated. >> >> >> On 19.07.2011 19:37, Yves Langisch wrote: > >>> I plan to setup a HBase installation on EC2. As recommended I therefore >>> want to setup a zookeeper ensemble with 3 nodes but I'm not sure > what kind >>> of storage I've to choose for the two zk directories (dataDir and >>> dataLogDir). Do this two directories need to be on a persistent storage >>> which survives a node crash? Or does an ephemeral storage device > suffice >>> since a failed node which is restarted is being synchronized with the > other >>> two nodes anyway? And what happens when I restart the whole zk ensemble > with >>> ephemeral storage which means there is no zk data available anymore > after >>> booting up? Any impact on the Hbase cluster? > > I don't think you want to use ephemeral storage given that HBase would > lose information if the zk cluster was restarted. But really that's a > better question for the hbase team, I don't know exactly how they are > using ZK and the effects of such a loss on their application. > > Regards, > > Patrick >
+
Andrew Purtell 2011-07-23, 05:42
-
Re: Zookeeper/Hbase storage type on EC2
Patrick Hunt 2011-07-25, 16:33
I was under the impression that there was some persistent data kept in ZK by hbase, good to know. Perhaps a FAQ entry for this?
Patrick
On Fri, Jul 22, 2011 at 10:42 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > HBase does not keep any persistent state in ZooKeeper. > > You can restart your ZK cluster one peer at a time without affecting HBase. > > If you are going to bring your entire ZK cluster down, first shut down HBase. Then once ZK is started again, bring up HBase. > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > > > ----- Original Message ----- >> From: Patrick Hunt <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Thursday, July 21, 2011 3:56 PM >> Subject: Re: Zookeeper/Hbase storage type on EC2 >> >> On Thu, Jul 21, 2011 at 1:21 AM, Yves Langisch <[EMAIL PROTECTED]> wrote: >>> I just need a statement if it makes sense to use ephemeral storage for ZK >> at >>> all (in conjunction with Hbase if the answer depends on the use case)? >>> >>> Any help is appreciated. >>> >>> >>> On 19.07.2011 19:37, Yves Langisch wrote: >> >>>> I plan to setup a HBase installation on EC2. As recommended I therefore >>>> want to setup a zookeeper ensemble with 3 nodes but I'm not sure >> what kind >>>> of storage I've to choose for the two zk directories (dataDir and >>>> dataLogDir). Do this two directories need to be on a persistent storage >>>> which survives a node crash? Or does an ephemeral storage device >> suffice >>>> since a failed node which is restarted is being synchronized with the >> other >>>> two nodes anyway? And what happens when I restart the whole zk ensemble >> with >>>> ephemeral storage which means there is no zk data available anymore >> after >>>> booting up? Any impact on the Hbase cluster? >> >> I don't think you want to use ephemeral storage given that HBase would >> lose information if the zk cluster was restarted. But really that's a >> better question for the hbase team, I don't know exactly how they are >> using ZK and the effects of such a loss on their application. >> >> Regards, >> >> Patrick >> >
+
Patrick Hunt 2011-07-25, 16:33
-
Re: Zookeeper/Hbase storage type on EC2
Ted Dunning 2011-07-25, 16:56
There is some persistent data.
It is just that it can be reconstructed.
This very felicitous fact is why snapshots work for free with MapR. If there were critical data in ZK, it would be considerably more ticklish to get a clean snap.
On Mon, Jul 25, 2011 at 9:33 AM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> I was under the impression that there was some persistent data kept in > ZK by hbase, good to know. Perhaps a FAQ entry for this? > > Patrick > > On Fri, Jul 22, 2011 at 10:42 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > HBase does not keep any persistent state in ZooKeeper. > > > > You can restart your ZK cluster one peer at a time without affecting > HBase. > > > > If you are going to bring your entire ZK cluster down, first shut down > HBase. Then once ZK is started again, bring up HBase. >
+
Ted Dunning 2011-07-25, 16:56
|
|