For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.
On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
> So the output is something like
> Key1 average of value 1 and value 3
> Key2 average 2 = value 2
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2....)and
> Reducer takes in this key and list of values and returns
> Key , new value..
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i understood something.
> Am i making any sense ??