Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> MapReduce Tutorial tweak


Copy link to this message
-
Re: MapReduce Tutorial tweak
Also to add, the default serialization libraries supported are specified in
core-default,xml as

<property>
  <name>io.serializations</name>

<value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
 obtaining serializers and deserializers.</description>
</property>

Since the default Java Serialization isn't supported , you would need to
convert to *Writables that Hadoop can use for better , compact
serialization of objects.

Regards
Ravi Magham
On Tue, Aug 27, 2013 at 9:27 PM, Shahab Yunus <[EMAIL PROTECTED]>wrote:

> As far as I undersstand, StringTokenizer.nextToken returns Java String
> type object which does not implement the required Writable and Comparable
> interfaces needed to Hadoop Mapreduce serialization and transport. The Text
> class does that and is compatible and thus that is why that is being used
> to wrap Java String and pass it on.
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:16 AM, Andrew Pennebaker <[EMAIL PROTECTED]
> > wrote:
>
>> In https://hadoop.apache.org/docs/stable/mapred_tutorial.html#Source+Code,
>> line 16 declares:
>>
>> private Text word = new Text();
>>
>> ...
>>
>> But only lines 22 and 23 use this, and only to pass the value along to
>> output:
>>
>> word.set(tokenizer.nextToken());
>> output.collect(word, one);
>>
>> Wouldn't this be better expressed as:
>>
>> (no private Text word)
>>
>> ...
>>
>> output.collect(tokenizer.nextToken(), one);
>>
>> ?
>>
>
>