Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - Possible check for speculative execution cancellation in finish() of storage UDF


Copy link to this message
-
Re: Possible check for speculative execution cancellation in finish() of storage UDF
Ashutosh Chauhan 2010-04-14, 00:14
Sandesh,

Which perf penalty you are trying to avoid? If you are writing same
record from four different reducers (which will happen with S.E.
turned on) you are only straining your DB.

Ashutosh
On Tue, Apr 13, 2010 at 17:12, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
> You don't have to do anything for that -- if a DB connection goes away, and
> the transaction is not committed, it will be rolled back.
>
> But this is a terrible idea for medium to large-sized data, or long-running
> tasks.
>
> I haven't looked at the patch, but I assume you would need to change how it
> works with transactions to get this to work.
>
> -D
>
> On Tue, Apr 13, 2010 at 5:04 PM, Sandesh Devaraju <
> [EMAIL PROTECTED]> wrote:
>
>> @Ashutosh: I am currently running task with speculative execution
>> turned off, but was wondering if there is a way to avoid the
>> performance penalty.
>>
>> @Dimitry: I would like to try out option 1 - any  pointers on how to
>> infer this "killed" status in the UDF?
>>
>> On Tuesday, April 13, 2010, Dmitriy Ryaboy <[EMAIL PROTECTED]> wrote:
>> > Option 1: write everything in a given mapper in one big transaction, roll
>> > back if killed (this is obviously a performance killer)
>> >
>> > Option 2: on spin-up, the task creates a temporary table by copying the
>> > definition from the main table; the allFinished() method, or whatever we
>> are
>> > calling it now, moves data from the temp tables of successfull attempts
>> into
>> > the main table. Also not awesome.
>> >
>> > Option 3: Write to fs, bulk import into a database at the end of your
>> job.
>> > Safest, sanest, most parallelizable. See dependency tools like the
>> recently
>> > open-sourced Azkaban for making life easier in that regard.
>> >
>> > -Dmitriy
>> >
>> > On Tue, Apr 13, 2010 at 4:35 PM, Ashutosh Chauhan <
>> > [EMAIL PROTECTED]> wrote:
>> >
>> >> Sandesh,
>> >>
>> >> As a workaround you can set the property
>> >> mapred.[map|reduce].max.attempts to 1, which I believe will turn off
>> >> speculative execution. You can pass this as -D switch on pig command
>> >> line or through mapred-site.xml . Proper way to do it will be the way
>> >> you suggested (though that will be less performant as well as  complex
>> >> to implement). You may also want to comment on that jira with your
>> >> issue.
>> >>
>> >> Ashutosh
>> >>
>> >> On Tue, Apr 13, 2010 at 16:16, Sandesh Devaraju
>> >> <[EMAIL PROTECTED]> wrote:
>> >> > Hi All,
>> >> >
>> >> > I am using PIG-1229 to write pig query output to a database. However,
>> >> > I noticed that because of speculative execution, spurious records end
>> >> > up being written.
>> >> >
>> >> > I was wondering if there is a way to infer if current reduce task is
>> >> > running in a speculative slot that was cancelled (and hence a rollback
>> >> > needs to be issued).
>> >> >
>> >> > Thanks in advance!
>> >> >
>> >> > - Sandesh
>> >> >
>> >>
>> >
>>
>