-Re: Restricting number of records from map output
Niels Basjes 2011-01-14, 16:46
> I have a sort job consisting of only the Mapper (no Reducer) task. I want my
> results to contain only the top n records. Is there any way of restricting
> the number of records that are emitted by the Mappers?
> Basically I am looking to see if there is an equivalent of achieving
> the behavior similar to LIMIT in SQL queries.
I think I understand your goal. However the question is toward (what I
think) is the wrong solution.
A mapper gets 1 record as input and only knows about that one record.
There is no way to limit there.
If you implement a simple reducer you can very easily let is stop
reading the input iterator after N records and limit the output in
Doing it in the reducer also allows you to easily add a concept of
"Top N" by using the "Secondary Sort" trick to sort the input before
it arrives at the reducer.