Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)


Copy link to this message
-
on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
Andrew Purtell 2010-03-12, 20:26
During the Q&A period after my presentation at HUG9, it was interesting that some in the audience indicated they are running production Hadoop and/or HBase clusters on EC2. I want to follow up on some comments I made there.

This is a little surprising, because currently the HDFS NameNode is a single point of failure which can bring the whole service
down. That the NameNode is a SPOF is not quite so large a concern if you have the ability to engineer the particular server hosting the NameNode to be especially reliable. However, when
architecting services on EC2, you must be mindful of its guarantees, or lack thereof. On EC2 the reliability of any given instance is not guaranteed, only the service in the aggregate.

Running
Hadoop on top of EC2 in production is thus not advised until there is a good hot
fail over solution for the NameNode.

AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/. Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So should you.

Regarding a hot fail over solution for the NameNode, there is some really interesting work ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the architecture.
    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
    http://issues.apache.org/jira/browse/HDFS-976

    http://issues.apache.org/jira/browse/HDFS-234
        http://issues.apache.org/jira/secure/attachment/12399656/create.png
        https://issues.apache.org/jira/browse/ZOOKEEPER-276
Once something like the above is vetted and tested, of course my above advice changes and it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.

In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase EC2 scripts are a useful tool for doing such things with relative ease.

Best regards,

   - Andy

----- Original Message ----
From: Jonathan Gray
To: [EMAIL PROTECTED]
Sent: Thu, March 11, 2010 3:01:22 PM
Subject: RE: [databasepro-48] HUG9

Pardon the link vomit, hopefully this comes across okay...
HBase Project Update by Jonathan Gray

http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&doget&target=HUG9_HBaseUpdate_JonathanGray.pdf
HBase and HDFS by Todd Lipcon of Cloudera

http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&doget&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
HBase on EC2 by Andrew Purtell of Trend Micro

http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf