Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Re: best way to join?


Copy link to this message
-
Re: best way to join?
Ted Dunning 2012-08-28, 15:32
On Tue, Aug 28, 2012 at 9:48 AM, dexter morgan <[EMAIL PROTECTED]>wrote:

>
> I understand your solution ( i think) , didn't think of that, in that
> particular way.
> I think that lets say i have 1M data-points, and running knn , that the
> k=1M and n=10 (each point is a cluster that requires up to 10 points)
> is an overkill.
>

I am not sure I understand you.  n = number of points.  k = number of
clusters.  For searching 1 million points, I would recommend thousands of
clusters.
> How can i achieve the same result WITHOUT using mahout, just running on
> the dataset , i even think it'll be in the same complexity (o(n^2))
>

Running with a good knn package will give you roughly O(n log n)
complexity.
+
dexter morgan 2012-08-28, 16:04
+
Ted Dunning 2012-08-31, 15:41
+
dexter morgan 2012-09-09, 09:22