Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: Naïve k-means using hadoop


+
Bertrand Dechoux 2013-03-27, 13:24
+
Charles Earl 2013-03-27, 14:39
Copy link to this message
-
Re: Naïve k-means using hadoop
If you're also a fan of doing things the better way, you can also
checkout some Apache Crunch (http://crunch.apache.org) ways of doing
this via https://github.com/cloudera/ml (blog post:
http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/).

On Wed, Mar 27, 2013 at 3:29 PM, Yaron Gonen <[EMAIL PROTECTED]> wrote:
> Hi,
> I'd like to implement k-means by myself, in the following naive way:
> Given a large set of vectors:
>
> Generate k random centers from set.
> Mapper reads all center and a split of the vectors set and emits for each
> vector the closest center as a key.
> Reducer calculated new center and writes it.
> Goto step 2 until no change in the centers.
>
> My question is very basic: how do I distribute all the new centers (produced
> by the reducers) to all the mappers? I can't use distributed cache since its
> read-only. I can't use the context.write since it will create a file for
> each reduce task, and I need a single file. The more general issue here is
> how to distribute data produced by reducer to all the mappers?
>
> Thanks.

--
Harsh J
+
Ted Dunning 2013-03-27, 16:47
+
Mark Miller 2013-03-27, 17:37
+
Bertrand Dechoux 2013-03-27, 12:34
+
Bertrand Dechoux 2013-03-27, 12:41