Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Re: [jira] [Created] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS


Copy link to this message
-
Re: [jira] [Created] (HDFS-4945) A Distributed and Cooperative NameNode Cluster for a Highly-Available HDFS
Azuryy Yu 2013-06-30, 03:47
hi,

fot your first question, if you deploy QJM ha, it doesnt need share
highly-reliable sophisticated storage.

--Send from my Sony mobile.
On Jun 30, 2013 11:38 AM, "Yonghwan Kim (JIRA)" <[EMAIL PROTECTED]> wrote:

> Yonghwan Kim created HDFS-4945:
> ----------------------------------
>
>              Summary: A Distributed and Cooperative NameNode Cluster for a
> Highly-Available HDFS
>                  Key: HDFS-4945
>                  URL: https://issues.apache.org/jira/browse/HDFS-4945
>              Project: Hadoop HDFS
>           Issue Type: New Feature
>           Components: auto-failover
>     Affects Versions: HA branch (HDFS-1623)
>             Reporter: Yonghwan Kim
>
>
> Recently, Hadoop attracts much attention of engineers and researchers as
> an emerging and effective framework for Big Data.
> HDFS(Hadoop Distributed File System) can manage huge amount of data with
> guaranteeing high performance and reliability
> with only commodity hardware.
>
> However, HDFS requires a single master node, called NameNode, to manage
> the entire namespace (or all the i-nodes)
> of a file system. This causes SPOF (Single Point Of Failure) problem
> because the file system becomes inaccessible
> when the NameNode fails. (HDFS-2064)
>
> This also causes a bottleneck of efficiency since all the access requests
> to the file system have to contact the
> NameNode. Hadoop 2.0 resolves the SPOF problem by introducing manual
> failover based on two NameNodes, Active and Standby.
> However, it still has the efficiency bottleneck problem since all the
> access requests have to contact the Active
> in ordinary executions. It may also lose an advantage of using commodity
> hardware since the two NameNodes have to
> share a highly-reliable sophisticated storage.
>
> We here propose a new HDFS architecture to resolve all the problems
> mentioned above.
> The proposed architecture has the following features and advantages.
>
> 1. Multiple NameNodes (not restricted to two) can be utilized to improve
> availability.
> The entire namespace of a file system is partitioned into several
> fragments, and replicas of each fragment are
> dispersed among the NameNodes.  When each fragment has k replicas, the
> file system can tolerate up to
> floor(k/2 - 1) faulty NameNodes.
>
> 2. Multiple NameNodes can be utilized to improve performance. The
> performance bottleneck caused by a single
> NameNode can be circumvented by assigning different NameNodes to different
> fragments as the primary ones
> (or the entry points).
>
> 3. The highly-reliable storage shared by the NameNodes is removed by
> introducing message-based consistency
> mechanism among the NameNodes.  The architecture requires only commodity
> hardware.
>
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>