Hi Anfernee,

You will achieve improved performance with federation only if you stripe files across the multiple NNs.  Federation basically shares DN storage with multiple NNs with the expectation the namespace load will be distributed across the multiple NNs.  If everything writes to the exact same parent directory then no benefit is achieved over a single NN.  You will need to partition your jobs so some write to one NN, other jobs write to the other NN(s).

I hope this helps!


On Jan 28, 2014, at 12:04 PM, Anfernee Xu <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>>


Based on http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/Federation.html#Key_Benefits, the overall performance can be improved by federation, but I'm not sure federation address my usercase, could someone elaborate it?

My usercase is I have one single NM and several DN, and I have bunch of concurrent MR jobs which will create new files(plan files and sub-directory) under the same parent directory, the questions are:

1) Will these concurrent writes(new file, plan files and sub-directory under the same parent directory) run in sequential because WRITE-once control govened by single NM?

I need this answer to estimate the necessity of moving to HDFS federation.


NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB