Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Re: When copying a file to HDFS, how to control what nodes that file will reside on?


Copy link to this message
-
Re: When copying a file to HDFS, how to control what nodes that file will reside on?
Mohammad Mustaqeem 2013-04-10, 06:11
Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
rack?
On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <[EMAIL PROTECTED]> wrote:

> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
>
> Raj
>
>   ------------------------------
> *From:* jeremy p <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
>
> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc.  When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on?  I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A".  I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>
>
>
--
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270