|
|
-
Re: FileNotFoundExcepion when getting files from DistributedCacheBarak Yaish 2012-11-22, 21:09
Thanks, that works :-)
On Thu, Nov 22, 2012 at 10:50 PM, Harsh J <[EMAIL PROTECTED]> wrote: > DistributedCache files in tasks are located locally (not on HDFS), so > use the LocalFileSystem or java.io.File if you prefer that, to read > them from within tasks. > > On Fri, Nov 23, 2012 at 2:16 AM, Barak Yaish <[EMAIL PROTECTED]> > wrote: > > Thanks for the quick response. > > > > I wanted to use DistributedCache to localized the files in interest to > all > > nodes, so which API should I use in order to be able to read all those > > files, regardless the node running the mapper? > > > > > > On Thu, Nov 22, 2012 at 10:38 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> > >> You pointed that you use: > >> > >> FSDataInputStream fs = FileSystem.get( context.getConfiguration() > ).open( > >> path ) > >> > >> Note that this (FileSystem.get) will return back a HDFS FileSystem by > >> default and your path is a local one. You can either use simple > >> java.io.File APIs or use > >> FileSystem.getLocal(context.getConfiguration()) [1] to get a local > >> filesystem handle that can look in file:/// FSes rather than hdfs:// > >> paths. > >> > >> [1] > >> > http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getLocal(org.apache.hadoop.conf.Configuration) > >> > >> On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish <[EMAIL PROTECTED]> > >> wrote: > >> > Hi, > >> > > >> > I’ve 2 nodes cluster (v1.04), master and slave. On the master, in > >> > Tool.run() > >> > we add two files to the DistributedCache using addCacheFile(). Files > do > >> > exist in HDFS. In the Mapper.setup() we want to retrieve those files > >> > from > >> > the cache using FSDataInputStream fs = FileSystem.get( > >> > context.getConfiguration() ).open( path ). The problem is that for one > >> > file > >> > a FileNotFoundException is thrown, although the file exists on the > slave > >> > node: > >> > > >> > attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: > >> > File > >> > does not exist: > >> > > >> > > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv > >> > > >> > ls –l on the slave: > >> > > >> > [hduser@slave ~]$ ll > >> > > >> > > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/ > >> > analytics/1.csv > >> > -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18 > >> > > >> > > /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv > >> > [hduser@slave ~]$ > >> > > >> > My questions are: > >> > > >> > Shouldn't all files exist on all nodes? > >> > What should be done to fix that? > >> > > >> > Thanks. > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J > |