Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> FileNotFoundExcepion when getting files from DistributedCache


Copy link to this message
-
Re: FileNotFoundExcepion when getting files from DistributedCache
You pointed that you use:

FSDataInputStream fs = FileSystem.get( context.getConfiguration() ).open( path )

Note that this (FileSystem.get) will return back a HDFS FileSystem by
default and your path is a local one. You can either use simple
java.io.File APIs or use
FileSystem.getLocal(context.getConfiguration()) [1] to get a local
filesystem handle that can look in file:/// FSes rather than hdfs://
paths.

[1] http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#getLocal(org.apache.hadoop.conf.Configuration)

On Fri, Nov 23, 2012 at 2:04 AM, Barak Yaish <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I’ve 2 nodes cluster (v1.04), master and slave. On the master, in Tool.run()
> we add two files to the DistributedCache using addCacheFile(). Files do
> exist in HDFS. In the Mapper.setup() we want to retrieve those files from
> the cache using FSDataInputStream fs = FileSystem.get(
> context.getConfiguration() ).open( path ). The problem is that for one file
> a FileNotFoundException is thrown, although the file exists on the slave
> node:
>
> attempt_201211211227_0020_m_000000_2: java.io.FileNotFoundException: File
> does not exist:
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
>
> ls –l on the slave:
>
> [hduser@slave ~]$ ll
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/
> analytics/1.csv
> -rwxr-xr-x 1 hduser hadoop 42701 Nov 22 10:18
> /somedir/hdp.tmp.dir/mapred/local/taskTracker/distcache/-7769715304990780/master/tmp/analytics/1.csv
> [hduser@slave ~]$
>
> My questions are:
>
> Shouldn't all files exist on all nodes?
> What should be done to fix that?
>
> Thanks.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB