Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Side-loading output from one MR into another?

Michael Parker 2012-08-23, 04:42
Harsh J 2012-08-23, 05:27
Michael Parker 2012-08-23, 06:57
Michael Parker 2012-08-23, 22:57
Copy link to this message
Re: Side-loading output from one MR into another?
I have map-side join example here


It is a great way to load data into memory on multiple machines

On 8/23/12 3:57 PM, "Michael Parker" <[EMAIL PROTECTED]> wrote:

>Actually, I was able to do some tricks and reduce the size to
>something that can be held in memory.
>Nonetheless, if anyone has an example of or more information about a
>map-side join, I would love to see it.
>- Mike
>On Wed, Aug 22, 2012 at 11:57 PM, Michael Parker
>> Thanks for the prompt reply!
>> Unfortunately, it's not that small.
>> I'm using the new API; are map side joins accomplished using
>> Are there any examples which use this package or map side joins?
>> The way I was thinking of doing it was to output the user-to-cohort
>> mapping from the first MR as a SequenceFile, and then each mapper in
>> the second MR could use a SequenceFile.Reader to find the cohort for a
>> user. It seems reasonable, but is this actually doable? It's like a
>> manual map-side join, I suppose, although likely not as elegant as
>> what you were proposing.
>> Thanks,
>> Mike
>> On Wed, Aug 22, 2012 at 10:27 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>>> If it is a small set, you can load it onto distributed cache and then
>>> onto the task's memory, or if its pretty big, perhaps you can do a
>>> map-side join?
>>> On Thu, Aug 23, 2012 at 10:12 AM, Michael Parker
>>> <[EMAIL PROTECTED]> wrote:
>>>> Hi all,
>>>> Is it possible to take a collection of sorted key-value pairs,
>>>> generated from one MapReduce, and side-load them into another
>>>> MapReduce, i.e. as it runs, the second MapReduce can look up the value
>>>> for a given key computed by the first MapReduce?
>>>> I need this for a cohort study -- one MR puts users into cohorts, and
>>>> the second MR needs that user-to-cohort mapping to see how cohorts
>>>> behave over time.
>>>> Any help would be greatly appreciated. Thanks!
>>>> - Mike
>>> --
>>> Harsh J