Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: Question about writing HDFS files


Copy link to this message
-
Re: Question about writing HDFS files
Harsh J 2013-05-17, 05:12
Thanks for the clarification Rahul. In that case, then the reading is
correct (and that a HDFS client behaves the same, in and out of MR -
its not really related to MR at all).

A "client outside" would write to a random set of datanode, across at
least two racks for 3 replicas if rack awareness is turned on.

On Fri, May 17, 2013 at 8:17 AM, Rahul Bhattacharjee
<[EMAIL PROTECTED]> wrote:
> Hi Harsh,
>
> I think what John meant by writing to local disk is writing to the same data
> node first which has initiated the write call.
>
> John can further clarify.
>
>
> On Fri, May 17, 2013 at 4:23 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>> That is not true. HDFS writes are not staged to a local disk first
>> before being written onto the DataNodes. The old architecture docs
>> seem to suggest that the writes get staged to a local disk but thats
>> not true anymore, see https://issues.apache.org/jira/browse/HDFS-1454.
>>
>> Also worth noting that a HDFS client behaves the same way in almost
>> all contexts, whether its invoked from an MR framework or directly
>> from shell.
>>
>> On Fri, May 17, 2013 at 3:38 AM, John Lilley <[EMAIL PROTECTED]>
>> wrote:
>> > I seem to recall reading that when a MapReduce task writes a file, the
>> > blocks of the file are always written to local disk, and replicated to
>> > other
>> > nodes.  If this is true, is this also true for non-MR applications
>> > writing
>> > to HDFS from Hadoop worker nodes?  What about clients outside of the
>> > cluster
>> > doing a file load?
>> >
>> > Thanks
>> >
>> > John
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>
>

--
Harsh J