Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Ignore keys while scheduling reduce jobs

Copy link to this message
Re: Ignore keys while scheduling reduce jobs

Does the mapper know what is the 1st point in the data set and the cluster
id corresponding to it ? I don't know much about the kmeans algorithm,
hence may be wrong ..

If the mappers have this information, then, the map task can check from the
clusters data whether a cluster id pertains to the first point and emit it
only if this condition is true, ignoring all other records.

Then you can set up your job to have only one reducer that will get all
values for the single cluster id and process it.


On Fri, Sep 14, 2012 at 4:56 PM, Aseem Anand <[EMAIL PROTECTED]> wrote:

> Hi,
> Consider it to be a single iteration Kmeans clustering job such that I
> only wish to schedule reduce jobs for the clusterId(the key for a Kmeans)
> of the cluster corresponding to the 1st point in the dataset.
> I wish to check the clusterId of the first point in the input file and get
> reduce jobs only for that specific clusterId.
> I think we shall have to wait for all mappers to end.
> Thanks,
> Aseem
> On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <
>> Hi,
>> When do you know the keys to ignore ? You mentioned "after the map stage"
>> .. is this at the end of each map task, or at the end of all map tasks ?
>> Thanks
>> hemanth
>> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <[EMAIL PROTECTED]>wrote:
>>> Hi,
>>> Is there anyway I can ignore all keys except a certain key ( determined
>>> after the map stage) to start only 1 reduce job using a partitioner? If so
>>> could someone suggest such a method.
>>> Regards,
>>> Aseem