Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Distributed Cache


+
Ted Yu 2013-07-09, 22:07
+
Azuryy Yu 2013-07-10, 01:26
Copy link to this message
-
RE: Distributed Cache
Botelho, Andrew 2013-07-10, 13:31
Ok using job.addCacheFile() seems to compile correctly.
However, how do I then access the cached file in my Mapper code?  Is there a method that will look for any files in the cache?

Thanks,

Andrew

From: Ted Yu [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 09, 2013 6:08 PM
To: [EMAIL PROTECTED]
Subject: Re: Distributed Cache

You should use Job#addCacheFile()

Cheers
On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi,

I was wondering if I can still use the DistributedCache class in the latest release of Hadoop (Version 2.0.5).
In my driver class, I use this code to try and add a file to the distributed cache:

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);
Job job = Job.getInstance();
...

However, I keep getting warnings that the method addCacheFile() is deprecated.
Is there a more current way to add files to the distributed cache?

Thanks in advance,

Andrew

+
Omkar Joshi 2013-07-10, 21:15
+
Botelho, Andrew 2013-07-10, 21:43
+
Omkar Joshi 2013-07-10, 22:47