Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> FileNotFoundExcepion when getting files from DistributedCache


Copy link to this message
-
Re: FileNotFoundExcepion when getting files from DistributedCache
Thanks for the quick response.

I wanted to use DistributedCache to localized the files in interest to all
nodes, so which API should I use in order to be able to read all those
files, regardless the node running the mapper?

On Thu, Nov 22, 2012 at 10:38 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> You pointed that you use:
>
> FSDataInputStream fs = FileSystem.get( context.getConfiguration() ).open(
> path )
>
> Note that this (FileSystem.get) will return back a HDFS FileSystem by
> default and your path is a local one. You can either use simple
> java.io.File APIs or use
> FileSystem.getLocal(context.getConfiguration()) [1] to get a local
> filesystem handle that can look in file:/// FSes rather than hdfs://
> paths.
>
> [1]
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getLocal(org.apache.hadoop.conf.Configuration)
>
> On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > I’ve 2 nodes cluster (v1.04), master and slave. On the master, in
> Tool.run()
> > we add two files to the DistributedCache using addCacheFile(). Files do
> > exist in HDFS. In the Mapper.setup() we want to retrieve those files from
> > the cache using FSDataInputStream fs = FileSystem.get(
> > context.getConfiguration() ).open( path ). The problem is that for one
> file
> > a FileNotFoundException is thrown, although the file exists on the slave
> > node:
> >
> > attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: File
> > does not exist:
> >
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
> >
> > ls –l on the slave:
> >
> > [hduser@slave ~]$ ll
> >
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/
> > analytics/1.csv
> > -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18
> >
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
> > [hduser@slave ~]$
> >
> > My questions are:
> >
> > Shouldn't all files exist on all nodes?
> > What should be done to fix that?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>