Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Re: How to modify hadoop-wordcount example to display File-wise results.


Copy link to this message
-
Re: How to modify hadoop-wordcount example to display File-wise results.
Aaron 

You can get the details of how much data each mapper processed, on which node ( IP address actually!) from the job logs.

Raj
>________________________________
> From: Ajay Srivastava <[EMAIL PROTECTED]>
>To: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]>
>Cc: "<[EMAIL PROTECTED]>" <[EMAIL PROTECTED]>
>Sent: Thursday, March 29, 2012 5:57 PM
>Subject: Re: How to modify hadoop-wordcount example to display File-wise results.
>
>Hi Aaron,
>I guess that it can be done by using counters.
>You can define a counter for each node in your cluster and then, in map method increment a node specific counter either by checking hostname or ip address.
>It's not a very good solution as you will need to modify your code whenever a node is added/removed from cluster and there will be as many if conditions in code as number of nodes. You can try this out if you do not find a cleaner solution. I wish that this counter should have been part of predefined counters.
>
>
>Regards,
>Ajay Srivastava
>
>
>On 30-Mar-2012, at 12:49 AM, aaron_v wrote:
>
>>
>> Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount
>> program. Can someone please let me know how to find which data gets mapped
>> to which node?In the sense, I have a master node 0 and 4 other nodes 1-4
>> and I ran the wordcount successfully. But I would like to print for each
>> node how much data it got from the input data file. Any suggestions??
>>
>> us latha wrote:
>>>
>>> Hi,
>>>
>>> Inside Map method, performed following change for  Example: WordCount
>>> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at
>>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>>> ------------------
>>> String filename = new String();
>>> ...
>>> filename =  ((FileSplit) reporter.getInputSplit()).getPath().toString();
>>> while (tokenizer.hasMoreTokens()) {
>>>            word.set(tokenizer.nextToken()+" "+filename);
>>> --------------------
>>>
>>> Worked great!! Thanks to everyone!
>>>
>>> Regards,
>>> Srilatha
>>>
>>>
>>> On Sat, Oct 18, 2008 at 6:24 PM, Latha <[EMAIL PROTECTED]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Thankyou for your valuable inputs in suggesting me the possible solutions
>>>> of creating an index file with following format.
>>>> word1 filename count
>>>> word2 filename count.
>>>>
>>>> However, following is not working for me. Please help me to resolve the
>>>> same.
>>>>
>>>> --------------------------
>>>> public static class Map extends MapReduceBase implements
>>>> Mapper<LongWritable, Text, Text, Text> {
>>>>          private Text word = new Text();
>>>>          private Text filename = new Text();
>>>>          public void map(LongWritable key, Text value,
>>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>>> IOException {
>>>>          filename.set( ((FileSplit)
>>>> reporter.getInputSplit()).getPath().toString());
>>>>          String line = value.toString();
>>>>          StringTokenizer tokenizer = new StringTokenizer(line);
>>>>          while (tokenizer.hasMoreTokens()) {
>>>>               word.set(tokenizer.nextToken());
>>>>               output.collect(word, filename);
>>>>              }
>>>>          }
>>>>  }
>>>>
>>>>  public static class Reduce extends MapReduceBase implements
>>>> Reducer<Text,
>>>> Text , Text, Text> {
>>>>      public void reduce(Text key, Iterator<Text> values,
>>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>>> IOException {
>>>>         int sum = 0;
>>>>         Text filename;
>>>>         while (values.hasNext()) {
>>>>             sum ++;
>>>>             filename.set(values.next().toString());
>>>>         }
>>>>       String file = filename.toString() + " " + ( new
>>>> IntWritable(sum)).toString();
>>>>       filename=new Text(file);
>>>>       output.collect(key, filename);
>>>>       }
>>>>  }
>>>>
>>>> --------------------------
>>>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id :