Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Can you access Distributed cache in custom output format ?


Copy link to this message
-
Re: Can you access Distributed cache in custom output format ?
Mmmh, I've never used the -files option (I don't know if it will copy the
files to HDFS for your or you have to put them there first).

My usage pattern of the DC is copying the files to HDFS, then use the DC API
to add those files to the jobconf.

Alejandro

On Fri, Jul 29, 2011 at 10:56 AM, Mapred Learn <[EMAIL PROTECTED]>wrote:

> i m trying to access file that I sent as -files option in my hadoop jar
> command.
>
> in my outputformat,
> I am doing something like:
>
> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
>
>         String file1="";
>         String file2="";
>         Path pt=null;
>
>         for (Path p : cacheFiles) {
>
>             if (p != null) {
>                 if (p.getName().endsWith(".ryp")) {
>                     file1 = p.getName();
>                 } else if (p.getName().endsWith(".cpt")) {
>                     file2 = p.getName();
>                     pt=p;
>                 }
>
>             }
>
>         }
>
> // then read the file, which gives file does not exist exception:
>
> Path pat = new Path(file2);
>
>         BufferedReader reader = null;
>         try {
>             FileSystem fs = FileSystem.get(conf);
>             reader=new BufferedReader(
>                     new InputStreamReader(fs.open(pat)));
>
>
>             String line = null;
>             while ((line = reader.readLine()) != null) {
>                 System.out.println("Now parsing the line: " + line);
>
>
>             }
>         } catch (Exception e) {
>             System.out.println("exception" + e.getMessage());
>
>         }
>
> On Fri, Jul 29, 2011 at 10:50 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:
>
>> Where are you getting the error, in the client submitting the job or in
>> the MR tasks?
>>
>> Are you trying to access a file or trying to set a JAR in the
>> DistributedCache?
>> How/when are you adding the file/JAR to the DC?
>> How are you retrieving the file/JAR from your outputformat code?
>>
>> Thxs.
>>
>> Alejandro
>>
>>
>> On Fri, Jul 29, 2011 at 10:43 AM, Mapred Learn <[EMAIL PROTECTED]>wrote:
>>
>>> I am trying to create a custom text outputformat where I want to access a
>>> distirbuted cache file.
>>>
>>>
>>>
>>> On Fri, Jul 29, 2011 at 10:42 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>
>>>> Mapred,
>>>>
>>>> By outputformat, do you mean the frontend, submit-time run of
>>>> OutputFormat? Then no, it cannot access the distributed cache cause
>>>> its not really setup at that point, and the front end doesn't need the
>>>> distributed cache really when it can access those files directly.
>>>>
>>>> Could you describe slightly deeper on what you're attempting to do?
>>>>
>>>> On Fri, Jul 29, 2011 at 10:57 PM, Mapred Learn <[EMAIL PROTECTED]>
>>>> wrote:
>>>> > Hi,
>>>> > I am trying to access distributed cache in my custom output format but
>>>> it
>>>> > does not work and file open in custom output format fails with file
>>>> does not
>>>> > exist even though it physically does.
>>>> >
>>>> > Looks like distributed cache only works for Mappers and Reducers ?
>>>> >
>>>> > Is there a way I can read Distributed Cache in my custom output format
>>>> ?
>>>> >
>>>> > Thanks,
>>>> > -JJ
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB