Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).
A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.
On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<[EMAIL PROTECTED]> wrote:
> Hi Harsh,
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
> John can further clarify.
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <[EMAIL PROTECTED]>
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes. If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes? What about clients outside of the
>> > cluster
>> > doing a file load?
>> > Thanks
>> > John
>> Harsh J