Thanks for the quick response.
I wanted to use DistributedCache to localized the files in interest to all
nodes, so which API should I use in order to be able to read all those
files, regardless the node running the mapper?
On Thu, Nov 22, 2012 at 10:38 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> You pointed that you use:
> FSDataInputStream fs = FileSystem.get( context.getConfiguration() ).open(
> path )
> Note that this (FileSystem.get) will return back a HDFS FileSystem by
> default and your path is a local one. You can either use simple
> java.io.File APIs or use
> FileSystem.getLocal(context.getConfiguration())  to get a local
> filesystem handle that can look in file:/// FSes rather than hdfs://
> On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish <[EMAIL PROTECTED]>
> > Hi,
> > I’ve 2 nodes cluster (v1.04), master and slave. On the master, in
> > we add two files to the DistributedCache using addCacheFile(). Files do
> > exist in HDFS. In the Mapper.setup() we want to retrieve those files from
> > the cache using FSDataInputStream fs = FileSystem.get(
> > context.getConfiguration() ).open( path ). The problem is that for one
> > a FileNotFoundException is thrown, although the file exists on the slave
> > node:
> > attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: File
> > does not exist:
> > ls –l on the slave:
> > [hduser@slave ~]$ ll
> > analytics/1.csv
> > -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18
> > [hduser@slave ~]$
> > My questions are:
> > Shouldn't all files exist on all nodes?
> > What should be done to fix that?
> > Thanks.
> Harsh J