Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: How to modify hadoop-wordcount example to display File-wise results.


+
aaron_v 2012-03-29, 19:19
Copy link to this message
-
Re: How to modify hadoop-wordcount example to display File-wise results.
Hi Aaron,
I guess that it can be done by using counters.
You can define a counter for each node in your cluster and then, in map method increment a node specific counter either by checking hostname or ip address.
It's not a very good solution as you will need to modify your code whenever a node is added/removed from cluster and there will be as many if conditions in code as number of nodes. You can try this out if you do not find a cleaner solution. I wish that this counter should have been part of predefined counters.
Regards,
Ajay Srivastava
On 30-Mar-2012, at 12:49 AM, aaron_v wrote:

>
> Hi people, Am new to Nabble and Hadoop. I was having a look at the wordcount
> program. Can someone please let me know how to find which data gets mapped
> to which node?In the sense, I have a master node 0 and 4 other nodes 1-4
> and I ran the wordcount successfully. But I would like to print for each
> node how much data it got from the input data file. Any suggestions??
>
> us latha wrote:
>>
>> Hi,
>>
>> Inside Map method, performed following change for  Example: WordCount
>> v1.0<http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0>at
>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>> ------------------
>> String filename = new String();
>> ...
>> filename =  ((FileSplit) reporter.getInputSplit()).getPath().toString();
>> while (tokenizer.hasMoreTokens()) {
>>            word.set(tokenizer.nextToken()+" "+filename);
>> --------------------
>>
>> Worked great!! Thanks to everyone!
>>
>> Regards,
>> Srilatha
>>
>>
>> On Sat, Oct 18, 2008 at 6:24 PM, Latha <[EMAIL PROTECTED]> wrote:
>>
>>> Hi All,
>>>
>>> Thankyou for your valuable inputs in suggesting me the possible solutions
>>> of creating an index file with following format.
>>> word1 filename count
>>> word2 filename count.
>>>
>>> However, following is not working for me. Please help me to resolve the
>>> same.
>>>
>>> --------------------------
>>> public static class Map extends MapReduceBase implements
>>> Mapper<LongWritable, Text, Text, Text> {
>>>          private Text word = new Text();
>>>          private Text filename = new Text();
>>>          public void map(LongWritable key, Text value,
>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>> IOException {
>>>          filename.set( ((FileSplit)
>>> reporter.getInputSplit()).getPath().toString());
>>>          String line = value.toString();
>>>          StringTokenizer tokenizer = new StringTokenizer(line);
>>>          while (tokenizer.hasMoreTokens()) {
>>>               word.set(tokenizer.nextToken());
>>>               output.collect(word, filename);
>>>              }
>>>          }
>>>  }
>>>
>>>  public static class Reduce extends MapReduceBase implements
>>> Reducer<Text,
>>> Text , Text, Text> {
>>>      public void reduce(Text key, Iterator<Text> values,
>>> OutputCollector<Text, Text > output, Reporter reporter) throws
>>> IOException {
>>>         int sum = 0;
>>>         Text filename;
>>>         while (values.hasNext()) {
>>>             sum ++;
>>>             filename.set(values.next().toString());
>>>         }
>>>       String file = filename.toString() + " " + ( new
>>> IntWritable(sum)).toString();
>>>       filename=new Text(file);
>>>       output.collect(key, filename);
>>>       }
>>>  }
>>>
>>> --------------------------
>>> 08/10/18 05:38:25 INFO mapred.JobClient: Task Id :
>>> task_200810170342_0010_m_000000_2, Status : FAILED
>>> java.io.IOException: Type mismatch in value from map: expected
>>> org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
>>>        at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
>>>        at org.myorg.WordCount$Map.map(WordCount.java:23)
>>>        at org.myorg.WordCount$Map.map(WordCount.java:13)
>>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
+
Raj Vishwanathan 2012-03-30, 02:56
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB