|
Joseph Adler
2012-11-17, 05:28
Cheolsoo Park
2012-12-03, 19:22
Joseph Adler
2012-12-03, 22:55
Joseph Adler
2012-12-06, 00:27
Joseph Adler
2012-12-17, 19:36
Joseph Adler
2012-12-20, 17:24
Joseph Adler
2013-01-04, 19:22
Joseph Adler
2013-01-04, 19:22
Jonathan Coveney
2013-03-19, 16:40
|
-
Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2012-11-17, 05:28
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ ----------------------------------------------------------- Review request for pig and Cheolsoo Park. Description ------- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs ----- build.xml 7d468a0 ivy.xml 70e8d50 ivy/libraries.properties 317564f src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/createTests.bash PRE-CREATION test/org/apache/pig/builtin/avro/createests.py PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/deflate/records.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/deflate/recordsAsOutputByPig.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/snappy/records.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordWithRepeatedSubRecords.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/records.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsAsOutputByPig.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsOfArrays.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsOfArraysOfRecords.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsSubSchema.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsSubSchemaNullable.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithEnums.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithFixed.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithMaps.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithMapsOfRecords.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recordsWithNullableUnions.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/avro/uncompressed/recursiveRecord.avro PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithNullabl
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageCheolsoo Park 2012-12-03, 19:22
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/#review13962 ----------------------------------------------------------- Overall looks great! I haven't gone through the test cases yet, but here are my comments so far. 1) I noticed that I cannot load .avro files that are not record types. For example, I tried to load a .avro file whose schema is "int" as follows: [cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar getschema foo2/test_int.avro "int" [cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar tojson foo2/test_int.avro 1 in = LOAD 'foo2/test_int.avro' USING AvroStorage('int'); DUMP in; This gives me the following error: Caused by: java.io.IOException: avroSchemaToResourceSchema only processes records Can only Avro record type be loaded? Or am I doing something wrong? 2) TestAvroStorage needs to be more automated. To run it, I had to run the following commands: ant clean compile-test cd ./test/org/apache/pig/builtin/avro python createests.py cd - ant clean test -Dtestcase=TestAvroStorage Ideally, I should be able to run a single command: ant clean -Dtestcase=TestAvroStorage. Please let me know if you need help for this. 3) python createests.py fails with the following errors. I suppose that some files are missing: creating data/avro/uncompressed/testDirectoryCounts.avro Exception in thread "main" java.io.FileNotFoundException: data/json/testDirectoryCounts.json (No such file or directory) ... creating evenFileNameTestDirectoryCounts.avro Exception in thread "main" java.io.FileNotFoundException: data/json/evenFileNameTestDirectoryCounts.json (No such file or directory) ... 4) ant test -Dtestcase=TestAvroStorage fails with the following errors. I suppose that this is due to the missing files: Testcase: testLoadDirectory took 0.005 sec FAILED Testcase: testLoadGlob took 0.004 sec FAILED Testcase: testPartialLoadGlob took 0.005 sec FAILED 5) Typo in the name of createests.py. It should be createtests.py. 6) Is createTests.bash needed at all? If not, can you remove it? I have more comments inline: ivy/libraries.properties <https://reviews.apache.org/r/8104/#comment29844> Can you update .eclipse.templates/.classpath too? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29845> Can you remove this comment? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29850> Can you change Exception to SchemaParseException? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29907> Wouldn't it be better to make this static? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29847> FileSystem.get(new Configuration()).open() will look for a file on the default FS that is specified by Hadoop conf. But it won't work in the following case: 1) The default FS is set by installed Hadoop conf as follows: <name>fs.default.name</name> <value>hdfs://cheolsoo-mr1-0.ent.cloudera.com:8020</value> 2) Run Pig in *local* mode and specify a schema file that is on local FS. This gives me the following error: java.io.FileNotFoundException: File does not exist: /user/cheolsoo/test_record.avsc 3) Even if I specify the uri schema as file:/, this still gives me the following error: java.lang.IllegalArgumentException: Wrong FS: file:/home/cheolsoo/pig-svn/test_record.avsc, expected: hdfs://cheolsoo-mr1-0.ent.cloudera.com:8020 Can you instead do the following? Path p = new Path(...); (FileSystem.get(p.toUri(), new Configuration()).open(p); Then, local FS will be used if file:/ is specified in path, and the default FS is if no uri scheme is specified. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29848> Same problem as above. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29849> Can you split this into two catch blocks: ParseException and IOException? It seems incorrect that the help message is printed when FileSystem throws an IOExcetion. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29851> Can you filter out files that start with "." as well? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29852> Same as above. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29853> This won't work in the following case. Let's say p matches two dirs, and one dir is empty. p = foo* foo1 foo2/bar.avro I would expect the schema of bar.avro is returned, but I get an IOException instead. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29854> Typo: not => No. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29857> Can you rename this class without Fast? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29856> Is this needed? In the constructor, schema is supposed to be set. If not, there must be an error. Shouldn't we throw an exception instead of re-trying to set schema? Please correct me if I am wrong. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29855> Can you replace ".avro" with AvroOutputFormat.EXT? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29870> Can you move this comment to the checkSchema() method? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment29858> Can you move this comment to the prepareToWrite() method? src/org/apache/pig/builtin/
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2012-12-03, 22:55
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/builtin/AvroStorage.java, lines 171-172 > > <https://reviews.apache.org/r/8104/diff/1/?file=191564#file191564line171> > > > > Same problem as above. Fixing this one within getAvroSchema > On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/builtin/AvroStorage.java, lines 382-388 > > <https://reviews.apache.org/r/8104/diff/1/?file=191564#file191564line382> > > > > Is this needed? > > > > In the constructor, schema is supposed to be set. If not, there must be an error. Shouldn't we throw an exception instead of re-trying to set schema? > > > > Please correct me if I am wrong. Pretty sure you're right about this one (and that this code is redundant). > On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/builtin/TrevniStorage.java, line 160 > > <https://reviews.apache.org/r/8104/diff/1/?file=191565#file191565line160> > > > > AvroStorage accepts files that do not end ".avro". Shouldn't TrevniStorage do the same? Good point... though I realize that I've defined "visible avro files" and "visible trevni files" methods that are probably not useful. I should probably just drop the methods. > On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/impl/util/AvroRecordReader.java, lines 110-118 > > <https://reviews.apache.org/r/8104/diff/1/?file=191568#file191568line110> > > > > I can't find where -ignoreErrors is used. I guess that error handling for bad files is not implemented yet? No, I haven't implemented it yet. I suspect that the best way to implement the error ignoring functionality is from within Pig, and should apply to all file types (not just Avro)... I'll add that discussion to the right JIRA thread > On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java, lines 85-91 > > <https://reviews.apache.org/r/8104/diff/1/?file=191571#file191571line85> > > > > How about a union type that contains a single data type (e.g. ["string"])? They're currently supported. Good point; that's a trivial change. Adding that > On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/impl/util/AvroTupleWrapper.java, line 163 > > <https://reviews.apache.org/r/8104/diff/1/?file=191572#file191572line163> > > > > Can you instead use log.debug(..., e)? Just added the exception to the line logging line above - Joseph ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/#review13962 ----------------------------------------------------------- On Nov. 17, 2012, 5:28 a.m., Joseph Adler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8104/ > ----------------------------------------------------------- > > (Updated Nov. 17, 2012, 5:28 a.m.) > > > Review request for pig and Cheolsoo Park. > > > Description > ------- > > The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) > > I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. > > This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). > > > This addresses bug PIG-3015. > https://issues.apache.org/jira/browse/PIG-3015 > > > Diffs > ----- > > build.xml 7d468a0 > ivy.xml 70e8d50 > ivy/libraries.properties 317564f > src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2012-12-06, 00:27
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > Overall looks great! I haven't gone through the test cases yet, but here are my comments so far. > > > > > > 1) I noticed that I cannot load .avro files that are not record types. For example, I tried to load a .avro file whose schema is "int" as follows: > > > > [cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar getschema foo2/test_int.avro > > "int" > > > > [cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar tojson foo2/test_int.avro > > 1 > > > > in = LOAD 'foo2/test_int.avro' USING AvroStorage('int'); > > DUMP in; > > > > This gives me the following error: > > > > Caused by: java.io.IOException: avroSchemaToResourceSchema only processes records > > > > Can only Avro record type be loaded? Or am I doing something wrong? > > > > > > 2) TestAvroStorage needs to be more automated. To run it, I had to run the following commands: > > > > ant clean compile-test > > cd ./test/org/apache/pig/builtin/avro > > python createests.py > > cd - > > ant clean test -Dtestcase=TestAvroStorage > > > > Ideally, I should be able to run a single command: ant clean -Dtestcase=TestAvroStorage. Please let me know if you need help for this. > > > > > > 3) python createests.py fails with the following errors. I suppose that some files are missing: > > > > creating data/avro/uncompressed/testDirectoryCounts.avro > > Exception in thread "main" java.io.FileNotFoundException: data/json/testDirectoryCounts.json (No such file or directory) > > ... > > creating evenFileNameTestDirectoryCounts.avro > > Exception in thread "main" java.io.FileNotFoundException: data/json/evenFileNameTestDirectoryCounts.json (No such file or directory) > > ... > > > > > > 4) ant test -Dtestcase=TestAvroStorage fails with the following errors. I suppose that this is due to the missing files: > > > > Testcase: testLoadDirectory took 0.005 sec > > FAILED > > Testcase: testLoadGlob took 0.004 sec > > FAILED > > Testcase: testPartialLoadGlob took 0.005 sec > > FAILED > > > > > > 5) Typo in the name of createests.py. It should be createtests.py. > > > > > > 6) Is createTests.bash needed at all? If not, can you remove it? > > > > > > I have more comments inline: Sounds like the python script isn't working completely correctly. I'll debug that script and make sure it generates all the required files. Can I take you up on your offer to help automate that build process? I'm not exactly sure what to modify to automatically run the python script to create the test files. > On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote: > > src/org/apache/pig/builtin/AvroStorage.java, lines 296-305 > > <https://reviews.apache.org/r/8104/diff/1/?file=191564#file191564line296> > > > > This won't work in the following case. Let's say p matches two dirs, and one dir is empty. > > > > p = foo* > > > > foo1 > > foo2/bar.avro > > > > I would expect the schema of bar.avro is returned, but I get an IOException instead. Added proper depth first search to find the first file. (I decided to sort by modification date, most recent first.) - Joseph ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/#review13962 ----------------------------------------------------------- On Nov. 17, 2012, 5:28 a.m., Joseph Adler wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8104/ > ----------------------------------------------------------- > > (Updated Nov. 17, 2012, 5:28 a.m.) > > > Review request for pig and Cheolsoo Park. > > > Description > ------- > > The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.)
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2012-12-17, 19:36
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ ----------------------------------------------------------- (Updated Dec. 17, 2012, 7:36 p.m.) Review request for pig and Cheolsoo Park. Changes ------- Added test cases for Trevni (and made sure all the test cases pass) Description ------- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs (updated) ----- .eclipse.templates/.classpath aa9bfd5 build.xml 1f21839 ivy.xml 70e8d50 ivy/libraries.properties bfbbbc0 src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/createtests.py PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/simpleRecordsTrevni.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithDoubleUnderscores.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/simpleRecordsTrevni.avsc PRE-CREATION test/unit-tests 7cede06 Diff: https://reviews.apache.org/r/8104/diff/ Testing Thanks, Joseph Adler
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2012-12-20, 17:24
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ ----------------------------------------------------------- (Updated Dec. 20, 2012, 5:24 p.m.) Review request for pig and Cheolsoo Park. Changes ------- Several bug fixes. (A couple fixes to get the tests to run correctly, and a fix to TrevniStorage so that the schema is correctly propagated to mappers/reducers in the storefunc) Description ------- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs (updated) ----- .eclipse.templates/.classpath aa9bfd5 build.xml 1f21839 ivy.xml 70e8d50 ivy/libraries.properties bfbbbc0 src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/createtests.py PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/simpleRecordsTrevni.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithDoubleUnderscores.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/simpleRecordsTrevni.avsc PRE-CREATION test/unit-tests 7cede06 Diff: https://reviews.apache.org/r/8104/diff/ Testing Thanks, Joseph
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2013-01-04, 19:22
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ ----------------------------------------------------------- (Updated Jan. 4, 2013, 7:22 p.m.) Review request for pig and Cheolsoo Park. Changes ------- Fixes to make compression work Description ------- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs (updated) ----- .eclipse.templates/.classpath c7b83b8 ivy.xml 70e8d50 ivy/libraries.properties 7b07c7e src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithDoubleUnderscores.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/simpleRecordsTrevni.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/testDirectory.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/testDirectoryCounts.avsc PRE-CREATION test/unit-tests 7cede06 Diff: https://reviews.apache.org/r/8104/diff/ Testing Thanks, Joseph Adler
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJoseph Adler 2013-01-04, 19:22
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ ----------------------------------------------------------- (Updated Jan. 4, 2013, 7:22 p.m.) Review request for pig and Cheolsoo Park. Description ------- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs ----- .eclipse.templates/.classpath c7b83b8 ivy.xml 70e8d50 ivy/libraries.properties 7b07c7e src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithDoubleUnderscores.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithEnums.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithFixed.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMaps.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithMapsOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithNullableUnions.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recursiveRecord.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/simpleRecordsTrevni.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/testDirectory.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/testDirectoryCounts.avsc PRE-CREATION test/unit-tests 7cede06 Diff: https://reviews.apache.org/r/8104/diff/ Testing Thanks, Joseph Adler
-
Re: Review Request: PIG-3015 Rewrite of AvroStorageJonathan Coveney 2013-03-19, 16:40
----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/#review18077 ----------------------------------------------------------- src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38174> Is there a reason why this doesn't support Pushdown projection? I'm ok with making that another JIRA ticket to tack on, but it seems to me that there is no reason to make AVRO deserialize a field that is never used, and it would be easy to leverage AVRO's already existing schema resolution to do this (we have an expected reader schema and could just prune unused columns, or somesuch). src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38166> Given that allowRecursive is a boolean, why do you need to box and then unbox this? hasOption returns a boolean (at least in the API doc I saw)? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38167> this should all be log.error imho, since it leads to errors that terminate the script. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38168> I don't love member names that start with capitals (especially when those members sound like classes), but I am willing to forego that if you're passionate about it :) src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38169> can the schema file not be an invisible file? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38170> I realize using Long's compareTo is convenient, but this seems like unnecessary boxing. why not just compare them directly? I realize this isn't performance critical cord, it just stuck out to me, since you could just do a > instead... src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38171> isn't this implicit? src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38172> I am a fan of putting stuff like this in PigConstants, though I haven't gotten this to catch on. Mainly just so we can see what is actually being set in the jobconf but not required at all. src/org/apache/pig/builtin/AvroStorage.java <https://reviews.apache.org/r/8104/#comment38173> same as above...basically I think if it halts everything, it should be log.error, or just nothing at all. but everything is open to debate :) src/org/apache/pig/builtin/TrevniStorage.java <https://reviews.apache.org/r/8104/#comment38175> Is it part of Trevni that the metadata is snappy encoded? src/org/apache/pig/impl/util/avro/AvroRecordWriter.java <https://reviews.apache.org/r/8104/#comment38176> gasp! trailing whitespace! :) src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java <https://reviews.apache.org/r/8104/#comment38177> since I can imagine this failing, I think it may be useful to catch ClassPathException here, log some info about the data types at play, and then just rethrow it. Just thinking ahead to debugging the case where someone saves something with a schema that doesn't match their expected schema. Maybe this would be caught upstream first? src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java <https://reviews.apache.org/r/8104/#comment38178> Not a huge fan of this, esp. given that the Exception stack trace will let them know where the error happened. src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java <https://reviews.apache.org/r/8104/#comment38179> More info will be useful here, esp. since this doesn't handle DateTime, BigInteger, BigDecimal src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java <https://reviews.apache.org/r/8104/#comment38180> If you have a map with a complex value, such as an array, will it then be a tuple wrapped in a tuple? Or does avro not allow this? src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java <https://reviews.apache.org/r/8104/#comment38181> might be worth log.debug'ing this, even though they asked for it, it's worth knowing when it is applied src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java <https://reviews.apache.org/r/8104/#comment38182> May want to throw an UnsupportedOperationException instead, as if this is being called, it's a more fundamental issue with Pig, separate from write related issues. src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java <https://reviews.apache.org/r/8104/#comment38184> shouldn't this throw an error? Or is avroObject.put() doing something I don't expect, perhaps being 1-indexed instead of 0-indexed? src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java <https://reviews.apache.org/r/8104/#comment38183> What about other types? Bags etc? src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java <https://reviews.apache.org/r/8104/#comment38185> Is the fields list returned by getSchema().getFields() a copy, instead of the original? If it's the original, it'd already be mutated, adn there is no reason to reset, no? src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java <https://reviews.apache.org/r/8104/#comment38186> lol src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java <https://reviews.apache.org/r/8104/#comment38187> I suppose this doesn't have to be hyper-exact, but this does not take into account the fact that classes are always on 8-byte boundaries, as well as the overhead of classes themselves. test/org/apache/pig/builtin/avro/code/pig/directory_test.pig <https://reviews.apache.org/r/8104/#comment38188> trailing whitespace :) - Jonathan Coveney On Jan. 4, 2013, 7:22 p.m., Joseph Adler wrote: |