Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - multiple puts in reducer?


Copy link to this message
-
Re: multiple puts in reducer?
Jacques 2012-02-28, 16:15
The key is that there are two output commit strategies for a map reduce
job.  Those that follow the map reduce paradigm and those that work outside
of it.

Option 1: Rely on map-reduce for committing your output: If you only only
work within an existing FileOutputFormat and associated
FileOutputCommitter, you don't have to worry about your outputs being
double created.  Speculative execution is automatically dealt with at the
map reduce layer.  (Only if a phase is succesful is the output pushed to
the next stage).

Option 2: Rely on your own semantics.  For example, generate your own
HTable and start running puts and deletes.  In this case, you better make
sure that your actions are idempotent.  Speculative execution means the
same action may run multiple times.  Even if you disable spec. ex., a task
may fail due to other problems and get restarted.  (For example if a
tasktracker node is over committed on memory.)  In this case, the first
part of your job may run multiple times even if you disable speculative
execution.  The only way to make this work correctly is to ensure that your
job actions are idempotent.

On Tue, Feb 28, 2012 at 7:22 AM, Ben Snively <[EMAIL PROTECTED]> wrote:

> I think you just need to turn the speculative execution off for that job?
>  The speculative execution that I am referring to is when the job tracker
> executes multiple instances of the same task operations across the cluster.
>  It will do this when the cluster isn't busy and particular tasks are
> taking to long, to see if it can get the task completed quicker on another
> node in the cluster.
>
> My fear was that if there was a mapreduce job running, where a reduce task
> was being executed.  Speculative execution could cause two instances of
> that same reduce job to get executed -- to see which one would finish
> first.  That could have different impact based on the use case and how the
> timestamp for the data being ingested into hbase was generated.
>
> Is this an issue or just me pretending to know more than I do?
>
> Thanks,
> Ben
>
>
>
> On Tue, Feb 28, 2012 at 10:06 AM, T Vinod Gupta <[EMAIL PROTECTED]
> >wrote:
>
> > thanks, that helps!!
> >
> > On Tue, Feb 28, 2012 at 7:02 AM, Tim Robertson <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi,
> > >
> > > You can call context.write() multiple times in the Reduce(), to emit
> > > more than one row.
> > >
> > > If you are creating the Puts in the Map function then you need to
> > > setMapSpeculativeExecution(false) on the job conf, or else Hadoop
> > > *might* spawn more than 1 attempt for a given task, meaning you'll get
> > > duplicate data.
> > >
> > > HTH,
> > > Tim
> > >
> > >
> > >
> > > On Tue, Feb 28, 2012 at 3:51 PM, T Vinod Gupta <[EMAIL PROTECTED]>
> > > wrote:
> > > > Ben,
> > > > I didn't quite understand your concern? What speculative execution
> are
> > > you
> > > > referring to?
> > > >
> > > > thanks
> > > > vinod
> > > >
> > > > On Tue, Feb 28, 2012 at 6:45 AM, Ben Snively <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > >> I think the short answer to that is yes, but the complex portion I
> > > would be
> > > >> worried about is the following:
> > > >>
> > > >>
> > > >> I guess along with that ,  how do manage speculative execution on
> the
> > > >> reducer (or is that only for map tasks)?
> > > >>
> > > >> I've always ended up creating import files and bringing it into
> HBase.
> > > >>
> > > >> Thanks,
> > > >> Ben
> > > >>
> > > >> On Tue, Feb 28, 2012 at 9:34 AM, T Vinod Gupta <
> [EMAIL PROTECTED]
> > > >> >wrote:
> > > >>
> > > >> > while doing map reduce on hbase tables, is it possible to do
> > multiple
> > > >> puts
> > > >> > in the reducer? what i want is a way to be able to write multiple
> > > rows.
> > > >> if
> > > >> > its not possible, then what are the other alternatives? i mean
> like
> > > >> > creating a wider table in that case.
> > > >> >
> > > >> > thanks
> > > >> >
> > > >>
> > >
> >
>