Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> What should storefuncs do on parse errors while reading?


Copy link to this message
-
Re: What should storefuncs do on parse errors while reading?
I typically increment a counter and have a bounded log of randomly sampled
erroneous data.

stan
On Mar 24, 2012 6:50 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
wrote:

> Can do a counter and log the first few thousand  rows or something ...
>
>
>
> On Mar 24, 2012, at 10:33 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
> > The pattern I use with bad data is to increment a counter and return
> null.
> > Logging and error message is also good, but that could turn into a
> massive
> > log file if there's a large dataset of bad data. Would be curious to hear
> > others thoughts re the logging bit.
> >
> > Either way, I think this is a good change to make to AvroStorage.
> >
> > On Fri, Mar 23, 2012 at 7:03 PM, Russell Jurney <
> [EMAIL PROTECTED]>wrote:
> >
> >> One record in a 125MB avro file is killing my script.  I could patch
> >> AvroStorage() to catch the exception and return null after logging an
> error
> >> - I think.  Should I?
> >>
> >> --
> >> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> >> datasyndrome.com
> >>
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > [EMAIL PROTECTED] going forward.*
>