Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request: PIG-3015 Rewrite of AvroStorage


+
Joseph Adler 2012-11-17, 05:28
Copy link to this message
-
Re: Review Request: PIG-3015 Rewrite of AvroStorage
Cheolsoo Park 2012-12-03, 19:22

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8104/#review13962
-----------------------------------------------------------
Overall looks great! I haven't gone through the test cases yet, but here are my comments so far.
1) I noticed that I cannot load .avro files that are not record types. For example, I tried to load a .avro file whose schema is "int" as follows:

[cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar getschema foo2/test_int.avro
"int"

[cheolsoo@cheolsoo-mr1-0 pig-svn]$ java -jar avro-tools-1.5.4.jar tojson foo2/test_int.avro
1

in = LOAD 'foo2/test_int.avro' USING AvroStorage('int');
DUMP in;

This gives me the following error:

Caused by: java.io.IOException: avroSchemaToResourceSchema only processes records

Can only Avro record type be loaded? Or am I doing something wrong?
2) TestAvroStorage needs to be more automated. To run it, I had to run the following commands:

ant clean compile-test
cd ./test/org/apache/pig/builtin/avro
python createests.py
cd -
ant clean test -Dtestcase=TestAvroStorage

Ideally, I should be able to run a single command: ant clean -Dtestcase=TestAvroStorage. Please let me know if you need help for this.
3) python createests.py fails with the following errors. I suppose that some files are missing:

creating data/avro/uncompressed/testDirectoryCounts.avro
Exception in thread "main" java.io.FileNotFoundException: data/json/testDirectoryCounts.json (No such file or directory)
...
creating evenFileNameTestDirectoryCounts.avro
Exception in thread "main" java.io.FileNotFoundException: data/json/evenFileNameTestDirectoryCounts.json (No such file or directory)
...
4) ant test -Dtestcase=TestAvroStorage fails with the following errors. I suppose that this is due to the missing files:

Testcase: testLoadDirectory took 0.005 sec
    FAILED
Testcase: testLoadGlob took 0.004 sec
    FAILED
Testcase: testPartialLoadGlob took 0.005 sec
    FAILED
5) Typo in the name of createests.py. It should be createtests.py.
6) Is createTests.bash needed at all? If not, can you remove it?
I have more comments inline:
ivy/libraries.properties
<https://reviews.apache.org/r/8104/#comment29844>

    Can you update .eclipse.templates/.classpath too?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29845>

    Can you remove this comment?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29850>

    Can you change Exception to SchemaParseException?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29907>

    Wouldn't it be better to make this static?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29847>

    FileSystem.get(new Configuration()).open() will look for a file on the default FS that is specified by Hadoop conf. But it won't work in the following case:
    
    1) The default FS is set by installed Hadoop conf as follows:
    <name>fs.default.name</name>
    <value>hdfs://cheolsoo-mr1-0.ent.cloudera.com:8020</value>
    
    2) Run Pig in *local* mode and specify a schema file that is on local FS. This gives me the following error:
    java.io.FileNotFoundException: File does not exist: /user/cheolsoo/test_record.avsc
    
    3) Even if I specify the uri schema as file:/, this still gives me the following error:
    java.lang.IllegalArgumentException: Wrong FS: file:/home/cheolsoo/pig-svn/test_record.avsc, expected: hdfs://cheolsoo-mr1-0.ent.cloudera.com:8020
    
    
    Can you instead do the following?
    
    Path p = new Path(...);
    (FileSystem.get(p.toUri(), new Configuration()).open(p);
    
    Then, local FS will be used if file:/ is specified in path, and the default FS is if no uri scheme is specified.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29848>

    Same problem as above.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29849>

    Can you split this into two catch blocks: ParseException and IOException?
    
    It seems incorrect that the help message is printed when FileSystem throws an IOExcetion.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29851>

    Can you filter out files that start with "." as well?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29852>

    Same as above.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29853>

    This won't work in the following case. Let's say p matches two dirs, and one dir is empty.
    
    p = foo*
    
    foo1
    foo2/bar.avro
    
    I would expect the schema of bar.avro is returned, but I get an IOException instead.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29854>

    Typo: not => No.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29857>

    Can you rename this class without Fast?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29856>

    Is this needed?
    
    In the constructor, schema is supposed to be set. If not, there must be an error. Shouldn't we throw an exception instead of re-trying to set schema?
    
    Please correct me if I am wrong.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29855>

    Can you replace ".avro" with AvroOutputFormat.EXT?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29870>

    Can you move this comment to the checkSchema() method?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment29858>

    Can you move this comment to the prepareToWrite() method?

src/org/apache/pig/builtin/
+
Joseph Adler 2012-12-06, 00:27
+
Joseph Adler 2012-12-03, 22:55
+
Joseph Adler 2012-12-17, 19:36
+
Joseph Adler 2012-12-20, 17:24
+
Joseph Adler 2013-01-04, 19:22
+
Joseph Adler 2013-01-04, 19:22
+
Jonathan Coveney 2013-03-19, 16:40
+
Joseph Adler 2013-05-20, 16:38
+
Joseph Adler 2013-05-20, 16:38
+
Joseph Adler 2013-05-20, 16:39