Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Re: [jira] [Created] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS


Copy link to this message
-
Re: [jira] [Created] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
hi,

fot your first question, if you deploy QJM ha, it doesnt need share
highly-reliable sophisticated storage.

--Send from my Sony mobile.
On Jun 30, 2013 11:38 AM, "Yonghwan Kim (JIRA)" <[EMAIL PROTECTED]> wrote:

> Yonghwan Kim created HDFS-4945:
> ----------------------------------
>
>              Summary: A Distributed and Cooperative NameNode Cluster for a
> Highly-Available HDFS
>                  Key: HDFS-4945
>                  URL: https://issues.apache.org/jira/browse/HDFS-4945
>              Project: Hadoop HDFS
>           Issue Type: New Feature
>           Components: auto-failover
>     Affects Versions: HA branch (HDFS-1623)
>             Reporter: Yonghwan Kim
>
>
> Recently, Hadoop attracts much attention of engineers and researchers as
> an emerging and effective framework for Big Data.
> HDFS(Hadoop Distributed File System) can manage huge amount of data with
> guaranteeing high performance and reliability
> with only commodity hardware.
>
> However, HDFS requires a single master node, called NameNode, to manage
> the entire namespace (or all the i-nodes)
> of a file system. This causes SPOF (Single Point Of Failure) problem
> because the file system becomes inaccessible
> when the NameNode fails. (HDFS-2064)
>
> This also causes a bottleneck of efficiency since all the access requests
> to the file system have to contact the
> NameNode. Hadoop 2.0 resolves the SPOF problem by introducing manual
> failover based on two NameNodes, Active and Standby.
> However, it still has the efficiency bottleneck problem since all the
> access requests have to contact the Active
> in ordinary executions. It may also lose an advantage of using commodity
> hardware since the two NameNodes have to
> share a highly-reliable sophisticated storage.
>
> We here propose a new HDFS architecture to resolve all the problems
> mentioned above.
> The proposed architecture has the following features and advantages.
>
> 1. Multiple NameNodes (not restricted to two) can be utilized to improve
> availability.
> The entire namespace of a file system is partitioned into several
> fragments, and replicas of each fragment are
> dispersed among the NameNodes.  When each fragment has k replicas, the
> file system can tolerate up to
> floor(k/2 - 1) faulty NameNodes.
>
> 2. Multiple NameNodes can be utilized to improve performance. The
> performance bottleneck caused by a single
> NameNode can be circumvented by assigning different NameNodes to different
> fragments as the primary ones
> (or the entry points).
>
> 3. The highly-reliable storage shared by the NameNodes is removed by
> introducing message-based consistency
> mechanism among the NameNodes.  The architecture requires only commodity
> hardware.
>
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB