Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Basic Question


Copy link to this message
-
Re: Basic Question
Harsh J 2012-08-07, 18:33
Each write call registers (writes) a KV pair to the output. The output
collector does not look for similarities nor does it try to de-dupe
it, and even if the object is the same, its value is copied so that
doesn't matter.

So you will get two KV pairs in your output - since duplication is
allowed and is normal in several MR cases. Think of wordcount, where a
map() call may emit lots of ("is", 1) pairs if there are multiple "is"
in the line it processes, and can use set() calls to its benefit to
avoid too many object creation.

On Tue, Aug 7, 2012 at 11:56 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote:
> In Mapper I often use a Global Text object and througout the map processing
> I just call "set" on it. My question is, what happens if collector receives
> similar byte array value. Does the last one overwrite the value in
> collector? So if I did
>
> Text zip = new Text();
> zip.set("9099");
> collector.write(zip,value);
> zip.set("9099");
> collector.write(zip,value1);
>
> Should I expect to receive both values in reducer or just one?

--
Harsh J