Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # dev >> [Discuss] Merge federation branch HDFS-1052 into trunk


+
Suresh Srinivas 2011-04-22, 16:48
+
Dhruba Borthakur 2011-04-23, 08:08
+
Doug Cutting 2011-04-25, 21:36
+
suresh srinivas 2011-04-26, 17:29
+
suresh srinivas 2011-04-26, 23:06
+
Doug Cutting 2011-04-27, 04:43
Copy link to this message
-
Re: [Discuss] Merge federation branch HDFS-1052 into trunk
Suresh, Sanjay.

1. I asked for benchmarks many times over the course of different
discussions on the topic.
I don't see any numbers attached to jira, and I was getting the same
response,
Doug just got from you, guys: which is "why would the performance be worse".
And this is not an argument for me.

2. I assume that merging requires a vote. I am sure people who know bylaws
better than I do will correct me if it is not true.
Did I miss the vote?

It feels like you are rushing this and are not doing what you would expect
others to
do in the same position, and what has been done in the past for such large
projects.

Thanks,
--Konstantin
On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:

> Suresh, Sanjay,
>
> Thank you very much for addressing my questions.
>
> Cheers,
>
> Doug
>
> On 04/26/2011 10:29 AM, suresh srinivas wrote:
> > Doug,
> >
> >
> >> 1. Can you please describe the significant advantages this approach has
> >> over a symlink-based approach?
> >
> > Federation is complementary with symlink approach. You could choose to
> > provide integrated namespace using symlinks. However, client side mount
> > tables seems a better approach for many reasons:
> > # Unlike symbolic links, client side mount tables can choose to go to
> right
> > namenode based on configuration. This avoids unnecessary RPCs to the
> > namenodes to discover the targer of symlink.
> > # The unavailability of a namenode where a symbolic link is configured
> does
> > not affect reaching the symlink target.
> > # Symbolic links need not be configured on every namenode in the cluster
> and
> > future changes to symlinks need not be propagated to multiple namenodes.
> In
> > client side mount tables, this information is in a central configuration.
> >
> > If a deployment still wants to use symbolic link, federation does not
> > preclude it.
> >
> >> It seems to me that one could run multiple namenodes on separate boxes
> > and run multile datanode processes per storage box
> >
> > There are several advantages to using a single datanode:
> > # When you have large number of namenodes (say 20), the cost of running
> > separate datanodes in terms of process resources such as memory is huge.
> > # The disk i/o management and storage utilization using a single datanode
> is
> > much better, as it has complete view the storage.
> > # In the approach you are proposing, you have several clusters to manage.
> > However with federation, all datanodes are in a single cluster; with
> single
> > configuration and operationally easier to manage.
> >
> >> The patch modifies much of the logic of Hadoop's central component, upon
> > which the performance and reliability of most other components of the
> > ecosystem depend.
> > That is not true.
> >
> > # Namenode is mostly unchanged in this feature.
> > # Read/write pipelines are unchanged.
> > # The changes are mainly in datanode:
> > #* the storage, FSDataset, Directory and Disk scanners now have another
> > level to incorporate block pool ID into the hierarchy. This is not a
> > significant change that should cause performance or stability concerns.
> > #* datanodes use a separate thread per NN, just like the existing thread
> > that communicates with NN.
> >
> >> Can you please tell me how this has been tested beyond unit tests?
> > As regards to testing, we have passed 600+ tests. In hadoop, these  tests
> > are mostly integration tests and not pure unit tests.
> >
> > While these tests have been extensive, we have also been testing this
> branch
> > for last 4 months, with QA validation that reflects our production
> > environment. We have found the system to be stable, performing well and
> have
> > not found any blockers with the branch so far.
> >
> > HDFS-1052 has been open more than a year now. I had also sent an email
> about
> > this merge around 2 months ago. There are 90 subtasks that have been
> worked
> > on last couple of months under HDFS-1052. Given that there was enough
> time
+
suresh srinivas 2011-04-27, 06:34
+
suresh srinivas 2011-04-27, 06:55
+
suresh srinivas 2011-04-27, 17:02
+
Tsz Wo \ 2011-04-27, 17:08
+
Devaraj Das 2011-04-27, 17:08
+
Konstantin Boudnik 2011-04-27, 17:41
+
suresh srinivas 2011-04-27, 21:36
+
Konstantin Shvachko 2011-04-28, 05:18
+
Hairong 2011-04-27, 17:46
+
Konstantin Shvachko 2011-04-28, 04:56
+
Konstantin Boudnik 2011-04-28, 13:36
+
suresh srinivas 2011-04-28, 18:02
+
Owen OMalley 2011-04-27, 20:53
+
suresh srinivas 2011-04-27, 21:44
+
Konstantin Shvachko 2011-04-28, 05:12
+
Owen OMalley 2011-04-28, 20:33
+
suresh srinivas 2011-04-28, 22:12
+
Konstantin Shvachko 2011-04-29, 06:30
+
Todd Lipcon 2011-05-02, 21:44
+
suresh srinivas 2011-05-03, 02:17
+
Sanjay Radia 2011-04-27, 00:26
+
Konstantin Boudnik 2011-04-27, 00:59
+
Dhruba Borthakur 2011-04-27, 04:27
+
Tsz Wo \ 2011-04-27, 05:16
+
Konstantin Shvachko 2011-04-27, 05:36
+
Konstantin Boudnik 2011-04-27, 05:40
+
suresh srinivas 2011-04-27, 06:28
+
Sanjay Radia 2011-04-27, 14:03
+
Eli Collins 2011-04-27, 21:36
+
suresh srinivas 2011-04-28, 00:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB