Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Re: Exception Handling in Pig Scripts


Copy link to this message
-
Re: Exception Handling in Pig Scripts
Koji Noguchi 2011-01-18, 18:48
If we're talking about couple of  bad records, can we directly use skip-record feature in mapreduce?

Koji
On 1/18/11 10:27 AM, "Julien Le Dem" <[EMAIL PROTECTED]> wrote:

That would be nice.
Also letting the error handler output the result to a relation would be useful.
(To let the script output application error metrics)
For example it could (optionally) use the keyword INTO just like the SPLIT operator.

FOO = LOAD ...;
A = FOREACH FOO GENERATE Bar(*) ON_ERROR SPLIT MyHandler INTO A_ERRORS;

ErrorHandler would look a little more like EvalFunc:

public interface ErrorHandler<T> {

  public T handle(IOExcetion ioe, EvalFunc evalFunc, Tuple input) throws
IOException;

public Schema outputSchema(Schema input);

}

There could be a built-in handler to output the skipped record (input: tuple, funcname:chararray, errorMessage:chararray)

A = FOREACH FOO GENERATE Bar(*) ON_ERROR SPLIT INTO A_ERRORS;

Julien

On 1/16/11 12:22 AM, "Dmitriy Ryaboy" <[EMAIL PROTECTED]> wrote:

I was thinking about this..

We add an optional ON_ERROR clause to operators, which allows a user to
specify error handling. The error handler would be a udf that would
implement an interface along these lines:

public interface ErrorHandler {

  public void handle(IOExcetion ioe, EvalFunc evalFunc, Tuple input) throws
IOException;

}

I think this makes sense not to make a static method so that users could
keep required state, and for example have the handler throw its own
IOException of it's been invoked too many times.

D
On Sat, Jan 15, 2011 at 11:53 PM, Santhosh Srinivasan <[EMAIL PROTECTED]>wrote:

> Thanks for the clarification Ashutosh.
>
> Implementing this in the user realm is tricky as Dmitriy states.
> Sensitivity to error thresholds will require support from the system. We can
> probably provide a taxonomy of records (good, bad, incomplete, etc.) to let
> users classify each record. The system can then track counts of each record
> type to facilitate the computation of thresholds. The last part is to allow
> users to specify thresholds and appropriate actions (interrupt, exit,
> continue, etc.). A possible mechanism to realize this is the
> ErrorHandlingUDF described by Dmitriy.
>
> Santhosh
>
> -----Original Message-----
> From: Ashutosh Chauhan [mailto:[EMAIL PROTECTED]]
> Sent: Friday, January 14, 2011 7:35 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Exception Handling in Pig Scripts
>
> Santhosh,
>
> The way you are proposing, it will kill the pig script. I think what user
> wants is to ignore few "bad records" and to process the rest and get
> results. Problem here is how to let user tell Pig the definition of "bad
> record" and how to let him specify threshold for % of bad records at which
> Pig should fail the script.
>
> Ashutosh
>
> On Fri, Jan 14, 2011 at 18:18, Santhosh Srinivasan <[EMAIL PROTECTED]>
> wrote:
> > Sorry about the late response.
> >
> > Hadoop n00b is proposing a language extension for error handling, similar
> to the mechanisms in other well known languages like C++, Java, etc.
> >
> > For now, can't the error semantics be handled by the UDF? For exceptional
> scenarios you could throw an ExecException with the right details. The
> physical operator that handles the execution of UDF's traps it for you and
> propagates the error back to the client. You can take a look at any of the
> builtin UDFs to see how Pig handles it internally.
> >
> > Santhosh
> >
> > -----Original Message-----
> > From: Dmitriy Ryaboy [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, January 11, 2011 10:41 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: Exception Handling in Pig Scripts
> >
> > Right now error handling is controlled by the UDFs themselves, and there
> is no way to direct it externally.
> > You can make an ErrorHandlingUDF that would take a udf spec, invoke it,
> trap errors, and then do the specified error handling behavior.. that's a
> bit ugly though.
> >
> > There is a problem with trapping general exceptions of course, in that if