Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Pig, mail # dev - Review Request: PIG-3015 Rewrite of AvroStorage


+
Joseph Adler 2012-11-17, 05:28
+
Cheolsoo Park 2012-12-03, 19:22
+
Joseph Adler 2012-12-06, 00:27
+
Joseph Adler 2012-12-03, 22:55
+
Joseph Adler 2012-12-17, 19:36
+
Joseph Adler 2012-12-20, 17:24
+
Joseph Adler 2013-01-04, 19:22
+
Joseph Adler 2013-01-04, 19:22
Copy link to this message
-
Re: Review Request: PIG-3015 Rewrite of AvroStorage
Jonathan Coveney 2013-03-19, 16:40

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8104/#review18077
-----------------------------------------------------------

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38174>

    Is there a reason why this doesn't support Pushdown projection? I'm ok with making that another JIRA ticket to tack on, but it seems to me that there is no reason to make AVRO deserialize a field that is never used, and it would be easy to leverage AVRO's already existing schema resolution to do this (we have an expected reader schema and could just prune unused columns, or somesuch).

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38166>

    Given that allowRecursive is a boolean, why do you need to box and then unbox this? hasOption returns a boolean (at least in the API doc I saw)?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38167>

    this should all be log.error imho, since it leads to errors that terminate the script.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38168>

    I don't love member names that start with capitals (especially when those members sound like classes), but I am willing to forego that if you're passionate about it :)

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38169>

    can the schema file not be an invisible file?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38170>

    I realize using Long's compareTo is convenient, but this seems like unnecessary boxing. why not just compare them directly? I realize this isn't performance critical cord, it just stuck out to me, since you could just do a > instead...

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38171>

    isn't this implicit?

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38172>

    I am a fan of putting stuff like this in PigConstants, though I haven't gotten this to catch on. Mainly just so we can see what is actually being set in the jobconf but not required at all.

src/org/apache/pig/builtin/AvroStorage.java
<https://reviews.apache.org/r/8104/#comment38173>

    same as above...basically I think if it halts everything, it should be log.error, or just nothing at all. but everything is open to debate :)

src/org/apache/pig/builtin/TrevniStorage.java
<https://reviews.apache.org/r/8104/#comment38175>

    Is it part of Trevni that the metadata is snappy encoded?

src/org/apache/pig/impl/util/avro/AvroRecordWriter.java
<https://reviews.apache.org/r/8104/#comment38176>

    gasp! trailing whitespace! :)

src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38177>

    since I can imagine this failing, I think it may be useful to catch ClassPathException here, log some info about the data types at play, and then just rethrow it. Just thinking ahead to debugging the case where someone saves something with a schema that doesn't match their expected schema. Maybe this would be caught upstream first?

src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38178>

    Not a huge fan of this, esp. given that the Exception stack trace will let them know where the error happened.

src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38179>

    More info will be useful here, esp. since this doesn't handle DateTime, BigInteger, BigDecimal

src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38180>

    If you have a map with a complex value, such as an array, will it then be a tuple wrapped in a tuple? Or does avro not allow this?

src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java
<https://reviews.apache.org/r/8104/#comment38181>

    might be worth log.debug'ing this, even though they asked for it, it's worth knowing when it is applied

src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38182>

    May want to throw an UnsupportedOperationException instead, as if this is being called, it's a more fundamental issue with Pig, separate from write related issues.

src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38184>

    shouldn't this throw an error? Or is avroObject.put() doing something I don't expect, perhaps being 1-indexed instead of 0-indexed?

src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38183>

    What about other types? Bags etc?

src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38185>

    Is the fields list returned by getSchema().getFields() a copy, instead of the original? If it's the original, it'd already be mutated, adn there is no reason to reset, no?

src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38186>

    lol

src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java
<https://reviews.apache.org/r/8104/#comment38187>

    I suppose this doesn't have to be hyper-exact, but this does not take into account the fact that classes are always on 8-byte boundaries, as well as the overhead of classes themselves.

test/org/apache/pig/builtin/avro/code/pig/directory_test.pig
<https://reviews.apache.org/r/8104/#comment38188>

    trailing whitespace :)
- Jonathan Coveney
On Jan. 4, 2013, 7:22 p.m., Joseph Adler wrote:
+
Joseph Adler 2013-05-20, 16:38
+
Joseph Adler 2013-05-20, 16:38
+
Joseph Adler 2013-05-20, 16:39