Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Distributed Cache


Copy link to this message
-
Re: Distributed Cache
     Path[] cachedFilePaths
          DistributedCache.getLocalCacheFiles(context.getConfiguration());

      for (Path cachedFilePath : cachedFilePaths) {

        File cachedFile = new File(cachedFilePath.toUri().getRawPath());

        System.out.println("cached fie path >> "

            + cachedFile.getAbsolutePath());

      }

I hope this helps for the time being.. JobContext was suppose to replace
DistributedCache api (it will be deprecated) however there is some problem
with that or I am missing something... Will reply if I find the solution to
it.

getCacheFiles will give you the uri used for localizing files... (original
uri used for adding it to cache).

getLocalCacheFiles .. will give you the actual file path on node manager.

Thanks,
Omkar Joshi
*Hortonworks Inc.* <http://www.hortonworks.com>
On Wed, Jul 10, 2013 at 2:43 PM, Botelho, Andrew <[EMAIL PROTECTED]>wrote:

> Ok so JobContext.getCacheFiles() retures URI[].****
>
> Let’s say I only stored one folder in the cache that has several .txt
> files within it.  How do I use that returned URI to read each line of those
> .txt files?****
>
> ** **
>
> Basically, how do I read my cached file(s) after I call
> JobContext.getCacheFiles()?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> Andrew****
>
> ** **
>
> *From:* Omkar Joshi [mailto:[EMAIL PROTECTED]]
> *Sent:* Wednesday, July 10, 2013 5:15 PM
>
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Distributed Cache****
>
> ** **
>
> try JobContext.getCacheFiles()****
>
>
> ****
>
> Thanks,****
>
> Omkar Joshi****
>
> *Hortonworks Inc.* <http://www.hortonworks.com>****
>
> ** **
>
> On Wed, Jul 10, 2013 at 6:31 AM, Botelho, Andrew <[EMAIL PROTECTED]>
> wrote:****
>
> Ok using job.addCacheFile() seems to compile correctly.****
>
> However, how do I then access the cached file in my Mapper code?  Is there
> a method that will look for any files in the cache?****
>
>  ****
>
> Thanks,****
>
>  ****
>
> Andrew****
>
>  ****
>
> *From:* Ted Yu [mailto:[EMAIL PROTECTED]]
> *Sent:* Tuesday, July 09, 2013 6:08 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Distributed Cache****
>
>  ****
>
> You should use Job#addCacheFile()****
>
>
> Cheers****
>
> On Tue, Jul 9, 2013 at 3:02 PM, Botelho, Andrew <[EMAIL PROTECTED]>
> wrote:****
>
> Hi,****
>
>  ****
>
> I was wondering if I can still use the DistributedCache class in the
> latest release of Hadoop (Version 2.0.5).****
>
> In my driver class, I use this code to try and add a file to the
> distributed cache:****
>
>  ****
>
> import java.net.URI;****
>
> import org.apache.hadoop.conf.Configuration;****
>
> import org.apache.hadoop.filecache.DistributedCache;****
>
> import org.apache.hadoop.fs.*;****
>
> import org.apache.hadoop.io.*;****
>
> import org.apache.hadoop.mapreduce.*;****
>
> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;****
>
> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;****
>
>  ****
>
> Configuration conf = new Configuration();****
>
> DistributedCache.addCacheFile(new URI("file path in HDFS"), conf);****
>
> Job job = Job.getInstance(); ****
>
> …****
>
>  ****
>
> However, I keep getting warnings that the method addCacheFile() is
> deprecated.****
>
> Is there a more current way to add files to the distributed cache?****
>
>  ****
>
> Thanks in advance,****
>
>  ****
>
> Andrew****
>
>  ****
>
> ** **
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB