Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - reducers and data locality


Copy link to this message
-
Re: reducers and data locality
Bejoy KS 2012-04-27, 09:24
Hi Mete

A custom Paritioner class can control the flow of keys to the desired reducer. It gives you more control on which key to which reducer.
Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: mete <[EMAIL PROTECTED]>
Date: Fri, 27 Apr 2012 09:19:21
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: reducers and data locality

Hello folks,

I have a lot of input splits (10k-50k - 128 mb blocks) which contains text
files. I need to process those line by line, then copy the result into
roughly equal size of "shards".

So i generate a random key (from a range of [0:numberOfShards]) which is
used to route the map output to different reducers and the size is more
less equal.

I know that this is not really efficient and i was wondering if i could
somehow control how keys are routed.
For example could i generate the randomKeys with hostname prefixes and
control which keys are sent to each reducer? What do you think?

Kind regards
Mete