-Re: Need some help for writing map reduce functions in hadoop-1.0.1 java
madhu phatak 2012-05-22, 13:45
You can go through the code of this project (
https://github.com/zinnia-phatak-dev/Nectar) to understand how the complex
algorithms are implemented using M/R.
On Fri, May 18, 2012 at 12:16 PM, Ravi Joshi <[EMAIL PROTECTED]> wrote:
> I am writing my own map and reduce method for implementing K Means
> algorithm in Hadoop-1.0.1 in java language. Although i got some example
> link of K Means algorithm in Hadoop over blogs but i don't want to copy
> their code, as a lerner i want to implement it my self. So i just need some
> ideas/clues for the same. Below is the work which i already done.
> I have Point and Cluster classes which are Writable, Point class have
> point x, point y and Cluster by whom this Point belongs. On the other hand
> my Cluster class has an ArrayList which stores all the Point objects which
> belongs to that Cluster. Cluseter class has an centroid variable also. Hope
> i am going correct (if not correct me please.)
> Now first of all my input (which is a file, containing some points
> coordinates) must be provided to Point Objects. I mean this input file must
> be mapped to all the Point. This should be done ONCE in map class (but
> how?). After assigning some value to each Point, some random Cluster must
> be chosen at the initial phase (This must be done only ONCE, but how). Now
> every Point must be mapped to all the cluster with the distance between
> that point and centroid. In the reduce method, every Point will be checked
> and assigned to that Cluster which is nearest to that Point (by comparing
> the distance). Now new centroid is calculated in each Cluster (Should map
> and reduce be called recursively? if yes then where all the initialization
> part would go. Here by saying initialization i mean providing input to
> Point objects (which must be done ONCE initially) and choosing some random
> centroid (Initially we have to choose random centroid ONCE) ).
> One more question, The value of parameter K(which will decide the total
> number of clusters should be assigned by user or hadoop will itself decide
> Somebody please explain me, i don't need the code, i want to write it
> myself. I need a way. Thank you.