Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Adding a soft-linked archive file to the distributed cache doesn't work as advertised


Copy link to this message
-
Re: Adding a soft-linked archive file to the distributed cache doesn't work as advertised
Bill,

In addition you must call DistributedCached.createSymlink(configuration),
that should do.

Thxs.

Alejandro

On Mon, Jan 9, 2012 at 10:30 AM, W.P. McNeill <[EMAIL PROTECTED]> wrote:

> I am trying to add a zip file to the distributed cache and have it unzipped
> on the task nodes with a softlink to the unzipped directory placed in the
> working directory of my mapper process. I think I'm doing everything the
> way the documentation tells me to, but it's not working.
>
> On the client in the run() function while I'm creating the job I first
> call:
>
> fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip");
>
> As expected, this copies the archive file gate-app.zip to the HDFS
> directory /tmp.
>
> Then I call
>
> DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app",
> configuration);
>
> I expect this to add "/tmp/gate-app.zip" to the distributed cache and put a
> softlink to it called gate-app in the working directory of each task.
> However, when I call job.waitForCompletion(), I see the following error:
>
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /tmp/gate-app.zip#gate-app.
>
> It appears that the distributed cache mechanism is interpreting the entire
> URI as the literal name of the file, instead of treating the fragment as
> the name of the softlink.
>
> As far as I can tell, I'm doing this correctly according to the API
> documentation:
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
> .
>
> The full project in which I'm doing this is up on github:
> https://github.com/wpm/Hadoop-GATE.
>
> Can someone tell me what I'm doing wrong?
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB