|
|
-
Re: Review Request: PIG-3059 Global configurable minimum 'bad record' thresholdsCheolsoo Park 2012-12-31, 01:56
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8765/ ----------------------------------------------------------- (Updated Dec. 31, 2012, 1:56 a.m.) Review request for pig, Santhosh Srinivasan, Jonathan Coveney, and Joseph Adler. Changes ------- - The error rate is printed as part of job stats. - The error message is improved. Now the location of the bad split that causes the run-time exception is printed. - InputErrorTracker counts the number of splits instead of records. - For backward compatibility, ignore_bad_files is not removed. When the ignore_bad_files option is enabled in AvroStorage, it is equivalent to setting pig.load.bad.split.threshold to 1.0. Description ------- This patch implements configurable bad records thresholds based on work done by Jonathan in PIG-2614. The changes include: - Adds new Pig properties - pig.load.bad.record.threshold and pig.load.bad.record.min. - Removes 'ignore_bad_files' option from AvroStorage since it's no longer needed. - Incorporates InputErrorTracker class written by Jonathan in PIG-2614. - Adds a try-catch block to nextKeyValue() method in PigRecordReader. - Adds new test cases to TestAvroStorage for these new properties. This addresses bug PIG-3059. https://issues.apache.org/jira/browse/PIG-3059 Diffs (updated) ----- conf/pig.properties 001a75e contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 771c313 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java 0a84915 contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java 9c37fec contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java 28a448f contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile2.avro e69de29 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile3.avro e69de29 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testCorruptedFile4.avro e69de29 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/bad.avro e69de29 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/test_corrupted_file/good.avro e69de29 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/InputErrorTracker.java e69de29 src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigRecordReader.java 6c77bad src/org/apache/pig/tools/pigstats/EmbeddedPigStats.java 45135b6 src/org/apache/pig/tools/pigstats/JobStats.java bdc08a5 src/org/apache/pig/tools/pigstats/PigStats.java 0228997 src/org/apache/pig/tools/pigstats/PigStatsUtil.java 521a482 src/org/apache/pig/tools/pigstats/SimplePigStats.java e4cd1c0 Diff: https://reviews.apache.org/r/8765/diff/ Testing ------- ant clean commit-test ant clean compile-test jar-withouthadoop cd contrib/piggybank/java ant clean test -Dtestcase=TestAvroStorage Thanks, Cheolsoo Park |