Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Binary Search in map reduce


+
jamal sasha 2013-01-07, 23:21
+
John Lilley 2013-01-07, 23:35
+
jamal sasha 2013-01-07, 23:42
+
John Lilley 2013-01-08, 00:11
Copy link to this message
-
Re: Binary Search in map reduce
jamal sasha 2013-01-08, 00:17
awesome.
thanks
On Mon, Jan 7, 2013 at 4:11 PM, John Lilley <[EMAIL PROTECTED]>wrote:

>  Let’s call these “the graph” and “the changes”.****
>
> ** **
>
> Will both the graph and the changes fit into memory?****
>
> Yes -> You do not have a Hadoop-scale problem.  Just write some code using
> HashTable or Dictionary.****
>
> ** **
>
> Will the graph fit into memory once it is partitioned amongst all of the
> nodes?****
>
> Yes -> You can get away without a join.  Partition the graph and the
> changes like below, but instead of doing a join on each partition, stream
> the changes against the graph partition in memory, using a HashTable for
> the graph partition.****
>
> ** **
>
> Otherwise, you can do this in a few steps.  Realize that you are doing a
> parallel join.  A parallel join can be done in hadoop by a simple modulo of
> the keys of the graph and the changes.  So first, create a couple of MR
> jobs just to partition “the graph” and “the changes” into N buckets using
> (key%N).  I **think** this is pretty straightforward because if your
> mapper adds new_key=(key%N) to the tuple and you use N reducers you get
> this behavior automatically (is it really that simple? someone with more MR
> expertise please correct me…).   Once the graph and the changes are
> partitioned, run another MR job to (1) join each graph partition file to
> the corresponding changes partition file (2) process the changes into the
> graph (3) write out the resulting graph.  This part is not a parallel join;
> it is a bunch of independent simple joins.  Finally, merge the resulting
> graphs together.  ****
>
> ** **
>
> You may find that it isn’t even this easy.  If nothing fits into memory
> and you must perform a non-trivial graph traversal for each change record,
> you have something must harder to do.****
>
> ** **
>
> FYI top google results for joins in Hadoop here:
> https://www.google.com/search?q=joins+in+hadoop&aq=f&oq=joins+in+hadoop&aqs=chrome.0.57j60l2j0l2j62.670&sugexp=chrome,mod=14&sourceid=chrome&ie=UTF-8
> ****
>
> ** **
>
> john****
>
> ** **
>
> *From:* jamal sasha [mailto:[EMAIL PROTECTED]]
> *Sent:* Monday, January 07, 2013 4:43 PM
> *To:* [EMAIL PROTECTED]
> *Subject:* Re: Binary Search in map reduce****
>
> ** **
>
> Hi****
>
>  Thanks for the reply. So here is the intent.****
>
> I process some data and output of that processing is this set of json
> documents outputting {key:[values]}  (This is essentially a form of graph
> where each entry is an edge)****
>
> Now.. I process a different set of data and the idea is to modify the
> existing document based on this new data.****
>
> If the key is present then add/modify values.****
>
> Else... create new key:[values] json object and save.****
>
> ** **
>
> So, the first step is checking whether the key is present or not..****
>
> So thats why I thought of doing the binary search.****
>
> Any suggestions?****
>
> ** **
>
> ** **
>
+
Mahesh Balija 2013-01-08, 03:26
+
John Lilley 2013-01-08, 04:05
+
Pamecha, Abhishek 2013-01-07, 23:31
+
jamal sasha 2013-01-08, 09:14