Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Can you access Distributed cache in custom output format ?


Copy link to this message
-
Re: Can you access Distributed cache in custom output format ?
So, the -files uses the '#' symlink feature, correct?

If so, from the MR task JVM doing a new File(<FILENAME>) would work, where
the FILENAME does not include the path of the file, correct?

Thxs.

Alejandro

On Fri, Jul 29, 2011 at 11:20 AM, Brock Noland <[EMAIL PROTECTED]> wrote:

> With -files the file will be placed in the CWD of the map/reduce tasks.
>
> You should be able to open the file with FileInputStream.
>
> Brock
>
> On Fri, Jul 29, 2011 at 1:18 PM, Mapred Learn <[EMAIL PROTECTED]>wrote:
>
>> when you use -files option, it copies in a .staging directory and all
>> mappers can access it but for output format, I see it is not able to access
>> it.
>>
>> -files copies cache file under:
>>
>> /user/<id>/.staging/<job name>/files/<filename>
>>
>>
>>
>> On Fri, Jul 29, 2011 at 11:14 AM, Alejandro Abdelnur <[EMAIL PROTECTED]>wrote:
>>
>>> Mmmh, I've never used the -files option (I don't know if it will copy the
>>> files to HDFS for your or you have to put them there first).
>>>
>>> My usage pattern of the DC is copying the files to HDFS, then use the DC
>>> API to add those files to the jobconf.
>>>
>>> Alejandro
>>>
>>>
>>> On Fri, Jul 29, 2011 at 10:56 AM, Mapred Learn <[EMAIL PROTECTED]>wrote:
>>>
>>>> i m trying to access file that I sent as -files option in my hadoop jar
>>>> command.
>>>>
>>>> in my outputformat,
>>>> I am doing something like:
>>>>
>>>> Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
>>>>
>>>>         String file1="";
>>>>         String file2="";
>>>>         Path pt=null;
>>>>
>>>>         for (Path p : cacheFiles) {
>>>>
>>>>             if (p != null) {
>>>>                 if (p.getName().endsWith(".ryp")) {
>>>>                     file1 = p.getName();
>>>>                 } else if (p.getName().endsWith(".cpt")) {
>>>>                     file2 = p.getName();
>>>>                     pt=p;
>>>>                 }
>>>>
>>>>             }
>>>>
>>>>         }
>>>>
>>>> // then read the file, which gives file does not exist exception:
>>>>
>>>> Path pat = new Path(file2);
>>>>
>>>>         BufferedReader reader = null;
>>>>         try {
>>>>             FileSystem fs = FileSystem.get(conf);
>>>>             reader=new BufferedReader(
>>>>                     new InputStreamReader(fs.open(pat)));
>>>>
>>>>
>>>>             String line = null;
>>>>             while ((line = reader.readLine()) != null) {
>>>>                 System.out.println("Now parsing the line: " + line);
>>>>
>>>>
>>>>             }
>>>>         } catch (Exception e) {
>>>>             System.out.println("exception" + e.getMessage());
>>>>
>>>>
>>>>         }
>>>>
>>>> On Fri, Jul 29, 2011 at 10:50 AM, Alejandro Abdelnur <[EMAIL PROTECTED]
>>>> > wrote:
>>>>
>>>>> Where are you getting the error, in the client submitting the job or in
>>>>> the MR tasks?
>>>>>
>>>>> Are you trying to access a file or trying to set a JAR in the
>>>>> DistributedCache?
>>>>> How/when are you adding the file/JAR to the DC?
>>>>> How are you retrieving the file/JAR from your outputformat code?
>>>>>
>>>>> Thxs.
>>>>>
>>>>> Alejandro
>>>>>
>>>>>
>>>>> On Fri, Jul 29, 2011 at 10:43 AM, Mapred Learn <[EMAIL PROTECTED]
>>>>> > wrote:
>>>>>
>>>>>> I am trying to create a custom text outputformat where I want to
>>>>>> access a distirbuted cache file.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 29, 2011 at 10:42 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> Mapred,
>>>>>>>
>>>>>>> By outputformat, do you mean the frontend, submit-time run of
>>>>>>> OutputFormat? Then no, it cannot access the distributed cache cause
>>>>>>> its not really setup at that point, and the front end doesn't need
>>>>>>> the
>>>>>>> distributed cache really when it can access those files directly.
>>>>>>>
>>>>>>> Could you describe slightly deeper on what you're attempting to do?
>>>>>>>
>>>>>>> On Fri, Jul 29, 2011 at 10:57 PM, Mapred Learn <
>>>>>>> [EMAIL PROTECTED]> wrote:
>>>>>>> > Hi,
>>>>>>> > I am trying to access distributed cache in my custom output format