Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig >> mail # user >> What should storefuncs do on parse errors while reading?


+
Russell Jurney 2012-03-24, 02:03
+
Bill Graham 2012-03-24, 17:33
+
Prashant Kommireddi 2012-03-24, 21:19
+
Fatal.error@...) 2012-03-24, 22:49
Copy link to this message
-
Re: What should storefuncs do on parse errors while reading?
I typically increment a counter and have a bounded log of randomly sampled
erroneous data.

stan
On Mar 24, 2012 6:50 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
wrote:

> Can do a counter and log the first few thousand  rows or something ...
>
>
>
> On Mar 24, 2012, at 10:33 AM, Bill Graham <[EMAIL PROTECTED]> wrote:
>
> > The pattern I use with bad data is to increment a counter and return
> null.
> > Logging and error message is also good, but that could turn into a
> massive
> > log file if there's a large dataset of bad data. Would be curious to hear
> > others thoughts re the logging bit.
> >
> > Either way, I think this is a good change to make to AvroStorage.
> >
> > On Fri, Mar 23, 2012 at 7:03 PM, Russell Jurney <
> [EMAIL PROTECTED]>wrote:
> >
> >> One record in a 125MB avro file is killing my script.  I could patch
> >> AvroStorage() to catch the exception and return null after logging an
> error
> >> - I think.  Should I?
> >>
> >> --
> >> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> >> datasyndrome.com
> >>
> >
> >
> >
> > --
> > *Note that I'm no longer using my Yahoo! email address. Please email me
> at
> > [EMAIL PROTECTED] going forward.*
>
+
Russell Jurney 2012-03-25, 23:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB