Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> multiple puts in reducer?


+
T Vinod Gupta 2012-02-28, 14:34
+
Tim Robertson 2012-02-28, 14:44
+
T Vinod Gupta 2012-02-28, 14:50
+
Ben Snively 2012-02-28, 14:45
+
T Vinod Gupta 2012-02-28, 14:51
+
Tim Robertson 2012-02-28, 15:02
+
T Vinod Gupta 2012-02-28, 15:06
+
Ben Snively 2012-02-28, 15:22
+
T Vinod Gupta 2012-02-28, 15:25
+
Jacques 2012-02-28, 16:15
Copy link to this message
-
Re: multiple puts in reducer?
Let me append this.

Having just looked at the code for TableOutputFormat, I must correct
myself.  TableOutputFormat does a direct commit so it falls under case 2.

So the only way to ensure that your output from a job is safe using
TableOutputFormat is to make sure the actions you're doing are indempotent.

To avoid this problem, you would need to use an output that correctly
supports commit.

Jacques

On Tue, Feb 28, 2012 at 8:15 AM, Jacques <[EMAIL PROTECTED]> wrote:

> The key is that there are two output commit strategies for a map reduce
> job.  Those that follow the map reduce paradigm and those that work outside
> of it.
>
> Option 1: Rely on map-reduce for committing your output: If you only only
> work within an existing FileOutputFormat and associated
> FileOutputCommitter, you don't have to worry about your outputs being
> double created.  Speculative execution is automatically dealt with at the
> map reduce layer.  (Only if a phase is succesful is the output pushed to
> the next stage).
>
> Option 2: Rely on your own semantics.  For example, generate your own
> HTable and start running puts and deletes.  In this case, you better make
> sure that your actions are idempotent.  Speculative execution means the
> same action may run multiple times.  Even if you disable spec. ex., a task
> may fail due to other problems and get restarted.  (For example if a
> tasktracker node is over committed on memory.)  In this case, the first
> part of your job may run multiple times even if you disable speculative
> execution.  The only way to make this work correctly is to ensure that your
> job actions are idempotent.
>
>
>
> On Tue, Feb 28, 2012 at 7:22 AM, Ben Snively <[EMAIL PROTECTED]> wrote:
>
>> I think you just need to turn the speculative execution off for that job?
>>  The speculative execution that I am referring to is when the job tracker
>> executes multiple instances of the same task operations across the
>> cluster.
>>  It will do this when the cluster isn't busy and particular tasks are
>> taking to long, to see if it can get the task completed quicker on another
>> node in the cluster.
>>
>> My fear was that if there was a mapreduce job running, where a reduce task
>> was being executed.  Speculative execution could cause two instances of
>> that same reduce job to get executed -- to see which one would finish
>> first.  That could have different impact based on the use case and how the
>> timestamp for the data being ingested into hbase was generated.
>>
>> Is this an issue or just me pretending to know more than I do?
>>
>> Thanks,
>> Ben
>>
>>
>>
>> On Tue, Feb 28, 2012 at 10:06 AM, T Vinod Gupta <[EMAIL PROTECTED]
>> >wrote:
>>
>> > thanks, that helps!!
>> >
>> > On Tue, Feb 28, 2012 at 7:02 AM, Tim Robertson <
>> [EMAIL PROTECTED]
>> > >wrote:
>> >
>> > > Hi,
>> > >
>> > > You can call context.write() multiple times in the Reduce(), to emit
>> > > more than one row.
>> > >
>> > > If you are creating the Puts in the Map function then you need to
>> > > setMapSpeculativeExecution(false) on the job conf, or else Hadoop
>> > > *might* spawn more than 1 attempt for a given task, meaning you'll get
>> > > duplicate data.
>> > >
>> > > HTH,
>> > > Tim
>> > >
>> > >
>> > >
>> > > On Tue, Feb 28, 2012 at 3:51 PM, T Vinod Gupta <[EMAIL PROTECTED]
>> >
>> > > wrote:
>> > > > Ben,
>> > > > I didn't quite understand your concern? What speculative execution
>> are
>> > > you
>> > > > referring to?
>> > > >
>> > > > thanks
>> > > > vinod
>> > > >
>> > > > On Tue, Feb 28, 2012 at 6:45 AM, Ben Snively <[EMAIL PROTECTED]>
>> > wrote:
>> > > >
>> > > >> I think the short answer to that is yes, but the complex portion I
>> > > would be
>> > > >> worried about is the following:
>> > > >>
>> > > >>
>> > > >> I guess along with that ,  how do manage speculative execution on
>> the
>> > > >> reducer (or is that only for map tasks)?
>> > > >>
>> > > >> I've always ended up creating import files and bringing it into
>
+
Michel Segel 2012-02-28, 15:44
+
T Vinod Gupta 2012-02-28, 16:14
+
Michael Segel 2012-02-28, 16:20
+
Ben Snively 2012-02-28, 17:40
+
Jacques 2012-02-29, 05:16
+
Michel Segel 2012-02-29, 13:18
+
Ben Snively 2012-02-29, 13:21
+
Michel Segel 2012-02-29, 13:04
+
Jacques 2012-03-01, 17:28