Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # general >> Hadoop Data Sharing


Copy link to this message
-
Re: Hadoop Data Sharing
What objects are you referring to? I'm not sure I understand your question.
- Aaron

On Tue, May 11, 2010 at 6:38 AM, Renato Marroquín Mogrovejo <
[EMAIL PROTECTED]> wrote:

> Thanks Aaron! I was thinking the same after doing some reading.
> Man what about serialize the objects? Would you think that is a good idea?
> Thanks again.
>
> Renato M.
>
>
> 2010/5/5 Aaron Kimball <[EMAIL PROTECTED]>
>
> > Renato,
> >
> > In general if you need to perform a multi-pass MapReduce workflow, each
> > pass
> > materializes its output to files. The subsequent pass then reads those
> same
> > files back in as input. This allows the workflow to start at the last
> > "checkpoint" if it gets interrupted. There is no persistent in-memory
> > distributed storage feature in Hadoop that would allow a MapReduce job to
> > post results to memory for consumption by a subsequent job.
> >
> > So you would just read your initial data from /input, and write your
> > interim
> > results to /iteration0. Then the next pass reads from /iteration0 and
> > writes
> > to /iteration1, etc..
> >
> > If your data is reasonably small and you think it could fit in memory
> > somewhere, then you could experiment with using other distributed
> key-value
> > stores (memcached[b], hbase, cassandra, etc..) to hold intermediate
> > results.
> > But this will require some integration work on your part.
> > - Aaron
> >
> > On Wed, May 5, 2010 at 8:29 AM, Renato Marroquín Mogrovejo <
> > [EMAIL PROTECTED]> wrote:
> >
> > > Hi everyone, I have recently started to play around with hadoop, but I
> am
> > > getting some into some "design" problems.
> > > I need to make a loop to execute the same job several times, and in
> each
> > > iteration get the processed values (not using a file because I would
> need
> > > to
> > > read it). I was using an static vector in my main class (the one that
> > > iterates and executes the job in each iteration) to retrieve those
> > values,
> > > and it did work while I was using a standalone mode. Now I tried to
> test
> > it
> > > on a pseudo-distributed manner and obviously is not working.
> > > Any suggestions, please???
> > >
> > > Thanks in advance,
> > >
> > >
> > > Renato M.
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB