Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Naïve k-means using hadoop


Copy link to this message
-
Re: Naïve k-means using hadoop
Thanks!
*Bertrand*: I don't like the idea of using a single reducer. A better way
for me is to write all the output of all the reducers to the same
directory, and then distribute all the files.
I know about Mahout of course, but I want to implement it myself. I will
look at the documentation though.
*Harsh*: I rather stick to Hadoop as much as I can, but thanks! I'll read
the stuff you linked.
On Wed, Mar 27, 2013 at 2:46 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> If you're also a fan of doing things the better way, you can also
> checkout some Apache Crunch (http://crunch.apache.org) ways of doing
> this via https://github.com/cloudera/ml (blog post:
> http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/).
>
> On Wed, Mar 27, 2013 at 3:29 PM, Yaron Gonen <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> > I'd like to implement k-means by myself, in the following naive way:
> > Given a large set of vectors:
> >
> > Generate k random centers from set.
> > Mapper reads all center and a split of the vectors set and emits for each
> > vector the closest center as a key.
> > Reducer calculated new center and writes it.
> > Goto step 2 until no change in the centers.
> >
> > My question is very basic: how do I distribute all the new centers
> (produced
> > by the reducers) to all the mappers? I can't use distributed cache since
> its
> > read-only. I can't use the context.write since it will create a file for
> > each reduce task, and I need a single file. The more general issue here
> is
> > how to distribute data produced by reducer to all the mappers?
> >
> > Thanks.
>
>
>
> --
> Harsh J
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB