Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - DFS and the RecordReader


Copy link to this message
-
Re: DFS and the RecordReader
Harsh J 2012-12-06, 22:15
Hi,

Not sure what you're talking about. RecordReaders, or for that matter,
any DFS InputStream, does not pull data locally before reading it.
Non-data-local reads are streamed over the network like how regular
data local reads are streamed over a local disk.

There is no such logic as the one you seek.

On Fri, Dec 7, 2012 at 3:07 AM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> Hi guys:
>
> Where and how does a Hadoop's record reader decide wether or not it needs to
> copy a file to local disk ?
>
> Clearly, since the InputSplit (which has meta data about file inputs) is the
> input to the RecordReader, the RecordReader would have to implement some
> kind of smart decision making ... Im looking for something like
>
> //Psuedocode
> if(! file.existsLocally())
>    copyFileToDisk(filegetPath());
>
> return new InputStream(file);
>
> I've looked here:
>
> http://grepcode.com/file/repo1.maven.org/maven2/org.jvnet.hudson.hadoop/hadoop-core/0.19.1-hudson-2/org/apache/hadoop/hdfs/DFSClient.java#DFSClient.create%28java.lang.String%2Corg.apache.hadoop.fs.permission.FsPermission%2Cboolean%2Cshort%2Clong%2Corg.apache.hadoop.util.Progressable%2Cint%29
>
> but don't see anything.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com

--
Harsh J