-Re: When copying a file to HDFS, how to control what nodes that file will reside on?
Mohammad Mustaqeem 2013-04-10, 06:11
Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <[EMAIL PROTECTED]> wrote:
> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
> *From:* jeremy p <[EMAIL PROTECTED]>
> *To:* [EMAIL PROTECTED]
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
> Hey all,
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc. When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on? I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A". I don't really care about the other two machines -- they could
> be any machines on my cluster.
> Thank you.
*With regards ---*