Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Re: best way to join?


Copy link to this message
-
Re: best way to join?
On Tue, Aug 28, 2012 at 9:48 AM, dexter morgan <[EMAIL PROTECTED]>wrote:

>
> I understand your solution ( i think) , didn't think of that, in that
> particular way.
> I think that lets say i have 1M data-points, and running knn , that the
> k=1M and n=10 (each point is a cluster that requires up to 10 points)
> is an overkill.
>

I am not sure I understand you.  n = number of points.  k = number of
clusters.  For searching 1 million points, I would recommend thousands of
clusters.
> How can i achieve the same result WITHOUT using mahout, just running on
> the dataset , i even think it'll be in the same complexity (o(n^2))
>

Running with a good knn package will give you roughly O(n log n)
complexity.
+
dexter morgan 2012-08-28, 16:04
+
Ted Dunning 2012-08-31, 15:41
+
dexter morgan 2012-09-09, 09:22
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB