Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - FileNotFoundExcepion when getting files from DistributedCache


Copy link to this message
-
Re: FileNotFoundExcepion when getting files from DistributedCache
Barak Yaish 2012-11-22, 21:09
Thanks, that works :-)

On Thu, Nov 22, 2012 at 10:50 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> DistributedCache files in tasks are located locally (not on HDFS), so
> use the LocalFileSystem or java.io.File if you prefer that, to read
> them from within tasks.
>
> On Fri, Nov 23, 2012 at 2:16 AM, Barak Yaish <[EMAIL PROTECTED]>
> wrote:
> > Thanks for the quick response.
> >
> > I wanted to use DistributedCache to localized the files in interest to
> all
> > nodes, so which API should I use in order to be able to read all those
> > files, regardless the node running the mapper?
> >
> >
> > On Thu, Nov 22, 2012 at 10:38 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> You pointed that you use:
> >>
> >> FSDataInputStream fs = FileSystem.get( context.getConfiguration()
> ).open(
> >> path )
> >>
> >> Note that this (FileSystem.get) will return back a HDFS FileSystem by
> >> default and your path is a local one. You can either use simple
> >> java.io.File APIs or use
> >> FileSystem.getLocal(context.getConfiguration()) [1] to get a local
> >> filesystem handle that can look in file:/// FSes rather than hdfs://
> >> paths.
> >>
> >> [1]
> >>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getLocal(org.apache.hadoop.conf.Configuration)
> >>
> >> On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish <[EMAIL PROTECTED]>
> >> wrote:
> >> > Hi,
> >> >
> >> > I’ve 2 nodes cluster (v1.04), master and slave. On the master, in
> >> > Tool.run()
> >> > we add two files to the DistributedCache using addCacheFile(). Files
> do
> >> > exist in HDFS. In the Mapper.setup() we want to retrieve those files
> >> > from
> >> > the cache using FSDataInputStream fs = FileSystem.get(
> >> > context.getConfiguration() ).open( path ). The problem is that for one
> >> > file
> >> > a FileNotFoundException is thrown, although the file exists on the
> slave
> >> > node:
> >> >
> >> > attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException:
> >> > File
> >> > does not exist:
> >> >
> >> >
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
> >> >
> >> > ls –l on the slave:
> >> >
> >> > [hduser@slave ~]$ ll
> >> >
> >> >
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/
> >> > analytics/1.csv
> >> > -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18
> >> >
> >> >
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
> >> > [hduser@slave ~]$
> >> >
> >> > My questions are:
> >> >
> >> > Shouldn't all files exist on all nodes?
> >> > What should be done to fix that?
> >> >
> >> > Thanks.
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>