Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> KMeans clustering on Hadoop infrastructure


Copy link to this message
-
Re: KMeans clustering on Hadoop infrastructure
You are likely going to get more help from talking to the Mahout mailing list.

https://cwiki.apache.org/confluence/display/MAHOUT/Mailing+Lists,+IRC+and+Archives

--Bobby Evans

On 4/28/12 7:45 AM, "Lukáš Kryške" <[EMAIL PROTECTED]> wrote:
Hello,
I am successfully running K-Means clustering sample from the 'Mahout In Action' book (example in Chapter 7.3) in my Hadoop environment.Now I need to extend the program to take the vectors from a file located in my HDFS. I need to process clustering of millions or billions of vectors which are represented by comma-separated values in a .txt file in HDFS. Data are stored in this pattern:
x1,y1x2,y2....xn,yn
As I understood from the book, I need to transform my .txt file with vectors into Hadoop's SequenceFile first - how to do it most efficiently? And how to tell to the KMeansDriver that the input path contains SequenceFile with vectors?

Thanks for help.

_________________Best Regards,Lukas Kryske
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB