|
|
jamal sasha 2012-11-21, 19:50
Hi.. I guess i am asking alot of fundamental questions but i thank you guys for taking out time to explain my doubts. So i am able to write map reduce jobs but here is my mydoubt As of now i am writing mappers which emit key and a value This key value is then captured at reducer end and then i process the key and value there. Let's say i want to calculate the average... Key1 value1 Key2 value 2 Key 1 value 3
So the output is something like Key1 average of value 1 and value 3 Key2 average 2 = value 2
Right now in reducer i have to create a dictionary with key as original keys and value is a list. Data = defaultdict(list) == // python usrr But i thought that Mapper takes in the key value pairs and outputs key: ( v1,v2....)and Reducer takes in this key and list of values and returns Key , new value..
So why is the input of reducer the simple output of mapper and not the list of all the values to a particular key or did i understood something. Am i making any sense ??
Mohammad Tariq 2012-11-21, 19:58
Hello Jamal,
For efficient processing all the values associated with the same key get sorted and go to same reducer. As a result the reducer gets a key and a list of values as its input. To me your assumption seems correct.
Regards, Mohammad Tariq
On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha <[EMAIL PROTECTED]> wrote:
> Hi.. > I guess i am asking alot of fundamental questions but i thank you guys for > taking out time to explain my doubts. > So i am able to write map reduce jobs but here is my mydoubt > As of now i am writing mappers which emit key and a value > This key value is then captured at reducer end and then i process the key > and value there. > Let's say i want to calculate the average... > Key1 value1 > Key2 value 2 > Key 1 value 3 > > So the output is something like > Key1 average of value 1 and value 3 > Key2 average 2 = value 2 > > Right now in reducer i have to create a dictionary with key as original > keys and value is a list. > Data = defaultdict(list) == // python usrr > But i thought that > Mapper takes in the key value pairs and outputs key: ( v1,v2....)and > Reducer takes in this key and list of values and returns > Key , new value.. > > So why is the input of reducer the simple output of mapper and not the > list of all the values to a particular key or did i understood something. > Am i making any sense ??
Bejoy KS 2012-11-21, 20:03
Hi Jamal
It is performed at a frame work level map emits key value pairs and the framework collects and groups all the values corresponding to a key from all the map tasks. Now the reducer takes the input as a key and a collection of values only. The reduce method signature defines it. Regards Bejoy KS
Sent from handheld, please excuse typos.
-----Original Message----- From: jamal sasha <[EMAIL PROTECTED]> Date: Wed, 21 Nov 2012 14:50:51 To: [EMAIL PROTECTED]<[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: fundamental doubt
Hi.. I guess i am asking alot of fundamental questions but i thank you guys for taking out time to explain my doubts. So i am able to write map reduce jobs but here is my mydoubt As of now i am writing mappers which emit key and a value This key value is then captured at reducer end and then i process the key and value there. Let's say i want to calculate the average... Key1 value1 Key2 value 2 Key 1 value 3
So the output is something like Key1 average of value 1 and value 3 Key2 average 2 = value 2
Right now in reducer i have to create a dictionary with key as original keys and value is a list. Data = defaultdict(list) == // python usrr But i thought that Mapper takes in the key value pairs and outputs key: ( v1,v2....)and Reducer takes in this key and list of values and returns Key , new value..
So why is the input of reducer the simple output of mapper and not the list of all the values to a particular key or did i understood something. Am i making any sense ??
jamal sasha 2012-11-21, 20:24
got it. thanks for clarification On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> ** > Hi Jamal > > It is performed at a frame work level map emits key value pairs and the > framework collects and groups all the values corresponding to a key from > all the map tasks. Now the reducer takes the input as a key and a > collection of values only. The reduce method signature defines it. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: * jamal sasha <[EMAIL PROTECTED]> > *Date: *Wed, 21 Nov 2012 14:50:51 -0500 > *To: *[EMAIL PROTECTED]<[EMAIL PROTECTED]> > *ReplyTo: * [EMAIL PROTECTED] > *Subject: *fundamental doubt > > Hi.. > I guess i am asking alot of fundamental questions but i thank you guys for > taking out time to explain my doubts. > So i am able to write map reduce jobs but here is my mydoubt > As of now i am writing mappers which emit key and a value > This key value is then captured at reducer end and then i process the key > and value there. > Let's say i want to calculate the average... > Key1 value1 > Key2 value 2 > Key 1 value 3 > > So the output is something like > Key1 average of value 1 and value 3 > Key2 average 2 = value 2 > > Right now in reducer i have to create a dictionary with key as original > keys and value is a list. > Data = defaultdict(list) == // python usrr > But i thought that > Mapper takes in the key value pairs and outputs key: ( v1,v2....)and > Reducer takes in this key and list of values and returns > Key , new value.. > > So why is the input of reducer the simple output of mapper and not the > list of all the values to a particular key or did i understood something. > Am i making any sense ?? >
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext