Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Hadoop Data Sharing


Copy link to this message
-
Re: Hadoop Data Sharing
What objects are you referring to? I'm not sure I understand your question.
- Aaron

On Tue, May 11, 2010 at 6:38 AM, Renato Marroquín Mogrovejo <
[EMAIL PROTECTED]> wrote:

> Thanks Aaron! I was thinking the same after doing some reading.
> Man what about serialize the objects? Would you think that is a good idea?
> Thanks again.
>
> Renato M.
>
>
> 2010/5/5 Aaron Kimball <[EMAIL PROTECTED]>
>
> > Renato,
> >
> > In general if you need to perform a multi-pass MapReduce workflow, each
> > pass
> > materializes its output to files. The subsequent pass then reads those
> same
> > files back in as input. This allows the workflow to start at the last
> > "checkpoint" if it gets interrupted. There is no persistent in-memory
> > distributed storage feature in Hadoop that would allow a MapReduce job to
> > post results to memory for consumption by a subsequent job.
> >
> > So you would just read your initial data from /input, and write your
> > interim
> > results to /iteration0. Then the next pass reads from /iteration0 and
> > writes
> > to /iteration1, etc..
> >
> > If your data is reasonably small and you think it could fit in memory
> > somewhere, then you could experiment with using other distributed
> key-value
> > stores (memcached[b], hbase, cassandra, etc..) to hold intermediate
> > results.
> > But this will require some integration work on your part.
> > - Aaron
> >
> > On Wed, May 5, 2010 at 8:29 AM, Renato Marroquín Mogrovejo <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi everyone, I have recently started to play around with hadoop, but I
> am
> > > getting some into some "design" problems.
> > > I need to make a loop to execute the same job several times, and in
> each
> > > iteration get the processed values (not using a file because I would
> need
> > > to
> > > read it). I was using an static vector in my main class (the one that
> > > iterates and executes the job in each iteration) to retrieve those
> > values,
> > > and it did work while I was using a standalone mode. Now I tried to
> test
> > it
> > > on a pseudo-distributed manner and obviously is not working.
> > > Any suggestions, please???
> > >
> > > Thanks in advance,
> > >
> > >
> > > Renato M.
> > >
> >
>