That is not true. HDFS writes are not staged to a local disk first
before being written onto the DataNodes. The old architecture docs
seem to suggest that the writes get staged to a local disk but thats
not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
Also worth noting that a HDFS client behaves the same way in almost
all contexts, whether its invoked from an MR framework or directly
On Fri, May 17, 2013 at 3:38 AM, John Lilley <[EMAIL PROTECTED]> wrote:
> I seem to recall reading that when a MapReduce task writes a file, the
> blocks of the file are always written to local disk, and replicated to other
> nodes. If this is true, is this also true for non-MR applications writing
> to HDFS from Hadoop worker nodes? What about clients outside of the cluster
> doing a file load?