Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> What should storefuncs do on parse errors while reading?


Copy link to this message
-
Re: What should storefuncs do on parse errors while reading?
The pattern I use with bad data is to increment a counter and return null.
Logging and error message is also good, but that could turn into a massive
log file if there's a large dataset of bad data. Would be curious to hear
others thoughts re the logging bit.

Either way, I think this is a good change to make to AvroStorage.

On Fri, Mar 23, 2012 at 7:03 PM, Russell Jurney <[EMAIL PROTECTED]>wrote:

> One record in a 125MB avro file is killing my script.  I could patch
> AvroStorage() to catch the exception and return null after logging an error
> - I think.  Should I?
>
> --
> Russell Jurney twitter.com/rjurney [EMAIL PROTECTED]
> datasyndrome.com
>

--
*Note that I'm no longer using my Yahoo! email address. Please email me at
[EMAIL PROTECTED] going forward.*
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB