-RE: Determine the key of Map function
Devaraj k 2012-04-24, 05:21
As per my understanding based on your problem description, you need to the below things.
1. Mapper : Write a mapper which emits records from input files and convert intto key and values. Here this key should contain teacher id, class id and no of students, value can be empty(or null).
2. Partitioner : Write Custom partitioner to send all the records for a teacher id to one reducer.
3. Grouping Comaparator : Write a comparator to group the records based on teacher id.
4. Sorting Comparator : Write a comparator to sort the records based on teacher id and no of students.
5. Reducer : In the reducer, you will get the records for all teachers one after other and also in the sorted order(by no of students) for a teacher id. You can keep how many top records you want in the reducer and finally can be written.
You can refer this doc for reference:
From: Lac Trung [[EMAIL PROTECTED]]
Sent: Tuesday, April 24, 2012 10:11 AM
To: [EMAIL PROTECTED]
Subject: Re: Determine the key of Map function
Ah, as I said before, I have no experience at programming MapReduce. So,
can you give me some documents or websites or something about programming
the thing you said above? ("Thousand things start hard" - VietNam)
Thanks so much ^^!
Vào 10:54 Ngày 24 tháng 4 năm 2012, Lac Trung <[EMAIL PROTECTED]> đã
> Thanks Jay so much !
> I will try this.
> Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <[EMAIL PROTECTED]> đã
> Ahh... Well than the key will be teacher, and the value will simply be
>> <-1 * # students, class_id> .
>> Then, you will see in the reducer that the first 3 entries will always be
>> the ones you wanted.
>> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <[EMAIL PROTECTED]>
>> > Hi Jay !
>> > I think it's a bit difference here. I want to get 30 classId for each
>> > teacherId that have most students.
>> > For example : get 3 classId.
>> > (File1)
>> > 1) Teacher1, Class11, 30
>> > 2) Teacher1, Class12, 29
>> > 3) Teacher1, Class13, 28
>> > 4) Teacher1, Class14, 27
>> > ... n ...
>> > n+1) Teacher2, Class21, 45
>> > n+2) Teacher2, Class22, 44
>> > n+3) Teacher2, Class23, 43
>> > n+4) Teacher2, Class24, 42
>> > ... n+m ...
>> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for
>> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <[EMAIL PROTECTED]> đã
>> > viết:
>> > > Its somewhat tricky to understand exactly what you need from your
>> > > explanation, but I believe you want teachers who have the most
>> > in
>> > > a given class. So for English, i have 10 teachers teaching the class
>> > and
>> > > i want the ones with the highes # of students.
>> > >
>> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid> as the
>> > > values.
>> > >
>> > > The values will then be sorted, by # of students. You can thus pick
>> > > teacher in the the first value of your reducer, and that will be the
>> > > teacher for class id = xyz , with the highes number of students.
>> > >
>> > > You can also be smart in your mapper by running a combiner to remove
>> > > teacherids who are clearly not maximal.
>> > >
>> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <[EMAIL PROTECTED]>
>> > wrote:
>> > >
>> > > > Hello everyone !
>> > > >
>> > > > I have a problem with MapReduce [:(] like that :
>> > > > I have 4 file input with 3 fields : teacherId, classId,
>> > > > (numberOfStudent is ordered by desc for each teach)
>> > > > Output is top 30 classId that numberOfStudent is max for each
>> > > > My approach is MapReduce like Wordcount example. But I don't know
>> > to
>> > > > determine key for map function.
>> > > > I run Wordcount example, understand its code but I have no
>> > at
>> > > > programming MapReduce.
>> > > >