Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Need some help for writing map reduce functions in hadoop-1.0.1 java


Copy link to this message
-
Re: Need some help for writing map reduce functions in hadoop-1.0.1 java
Hi,
 You can go through the code of this project (
https://github.com/zinnia-phatak-dev/Nectar) to understand how the complex
algorithms are implemented using M/R.

On Fri, May 18, 2012 at 12:16 PM, Ravi Joshi <[EMAIL PROTECTED]> wrote:

> I am writing my own map and reduce method for implementing K Means
> algorithm in Hadoop-1.0.1 in java language. Although i got some example
> link of K Means algorithm in Hadoop over blogs but i don't want to copy
> their code, as a lerner i want to implement it my self. So i just need some
> ideas/clues for the same. Below is the work which i already done.
>
> I have Point and Cluster classes which are Writable, Point class have
> point x, point y and Cluster by whom this Point belongs. On the other hand
> my Cluster class has an ArrayList which stores all the Point objects which
> belongs to that Cluster. Cluseter class has an centroid variable also. Hope
> i am going correct (if not correct me please.)
>
> Now first of all my input (which is a file, containing some points
> coordinates) must be provided to Point Objects. I mean this input file must
> be mapped to all the Point. This should be done ONCE in map class (but
> how?). After assigning some value to each Point, some random Cluster must
> be chosen at the initial phase (This must be done only ONCE, but how). Now
> every Point must be mapped to all the cluster with the distance between
> that point and centroid. In the reduce method, every Point will be checked
> and assigned to that Cluster which is nearest to that Point (by comparing
> the distance). Now new centroid is calculated in each Cluster (Should map
> and reduce be called recursively? if yes then where all the initialization
> part would go. Here by saying initialization i mean providing input to
> Point objects (which must be done ONCE initially) and choosing some random
> centroid (Initially we have to choose random centroid ONCE) ).
> One more question, The value of parameter K(which will decide the total
> number of clusters should be assigned by user or hadoop will itself decide
> it?)
>
> Somebody please explain me, i don't need the code, i want to write it
> myself. I need a way. Thank you.
>
> -Ravi
>

--
https://github.com/zinnia-phatak-dev/Nectar
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB