|
|
+
Michael Parker 2012-08-23, 04:42
+
Harsh J 2012-08-23, 05:27
+
Michael Parker 2012-08-23, 06:57
+
Michael Parker 2012-08-23, 22:57
-
Re: Side-loading output from one MR into another?Serge Blazhiyevskyy 2012-08-23, 23:03
I have map-side join example here
http://askhadoop.blogspot.com/2011/12/map-side-join_27.html It is a great way to load data into memory on multiple machines Regards, Serge On 8/23/12 3:57 PM, "Michael Parker" <[EMAIL PROTECTED]> wrote: >Actually, I was able to do some tricks and reduce the size to >something that can be held in memory. > >Nonetheless, if anyone has an example of or more information about a >map-side join, I would love to see it. > >Thanks! > >- Mike > > >On Wed, Aug 22, 2012 at 11:57 PM, Michael Parker ><[EMAIL PROTECTED]> wrote: >> Thanks for the prompt reply! >> >> Unfortunately, it's not that small. >> >> I'm using the new API; are map side joins accomplished using >> >>http://hadoop.apache.org/common/docs/r1.0.3/api/org/apache/hadoop/contrib >>/utils/join/package-summary.html? >> Are there any examples which use this package or map side joins? >> >> The way I was thinking of doing it was to output the user-to-cohort >> mapping from the first MR as a SequenceFile, and then each mapper in >> the second MR could use a SequenceFile.Reader to find the cohort for a >> user. It seems reasonable, but is this actually doable? It's like a >> manual map-side join, I suppose, although likely not as elegant as >> what you were proposing. >> >> Thanks, >> Mike >> >> On Wed, Aug 22, 2012 at 10:27 PM, Harsh J <[EMAIL PROTECTED]> wrote: >>> If it is a small set, you can load it onto distributed cache and then >>> onto the task's memory, or if its pretty big, perhaps you can do a >>> map-side join? >>> >>> On Thu, Aug 23, 2012 at 10:12 AM, Michael Parker >>> <[EMAIL PROTECTED]> wrote: >>>> Hi all, >>>> >>>> Is it possible to take a collection of sorted key-value pairs, >>>> generated from one MapReduce, and side-load them into another >>>> MapReduce, i.e. as it runs, the second MapReduce can look up the value >>>> for a given key computed by the first MapReduce? >>>> >>>> I need this for a cohort study -- one MR puts users into cohorts, and >>>> the second MR needs that user-to-cohort mapping to see how cohorts >>>> behave over time. >>>> >>>> Any help would be greatly appreciated. Thanks! >>>> >>>> - Mike >>> >>> >>> >>> -- >>> Harsh J |