Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Review Request: PIG-3015 Rewrite of AvroStorage


Copy link to this message
-
Re: Review Request: PIG-3015 Rewrite of AvroStorage


> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/builtin/AvroStorage.java, lines 171-172
> > <https://reviews.apache.org/r/8104/diff/1/?file=191564#file191564line171>
> >
> >     Same problem as above.

Fixing this one within getAvroSchema
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/builtin/AvroStorage.java, lines 382-388
> > <https://reviews.apache.org/r/8104/diff/1/?file=191564#file191564line382>
> >
> >     Is this needed?
> >    
> >     In the constructor, schema is supposed to be set. If not, there must be an error. Shouldn't we throw an exception instead of re-trying to set schema?
> >    
> >     Please correct me if I am wrong.

Pretty sure you're right about this one (and that this code is redundant).
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/builtin/TrevniStorage.java, line 160
> > <https://reviews.apache.org/r/8104/diff/1/?file=191565#file191565line160>
> >
> >     AvroStorage accepts files that do not end ".avro". Shouldn't TrevniStorage do the same?

Good point... though I realize that I've defined "visible avro files" and "visible trevni files" methods that are probably not useful. I should probably just drop the methods.
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/impl/util/AvroRecordReader.java, lines 110-118
> > <https://reviews.apache.org/r/8104/diff/1/?file=191568#file191568line110>
> >
> >     I can't find where -ignoreErrors is used. I guess that error handling for bad files is not implemented yet?

No, I haven't implemented it yet. I suspect that the best way to implement the error ignoring functionality is from within Pig, and should apply to all file types (not just Avro)... I'll add that discussion to the right JIRA thread
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/impl/util/AvroStorageSchemaConversionUtilities.java, lines 85-91
> > <https://reviews.apache.org/r/8104/diff/1/?file=191571#file191571line85>
> >
> >     How about a union type that contains a single data type (e.g. ["string"])? They're currently supported.

Good point; that's a trivial change. Adding that
> On Dec. 3, 2012, 7:22 p.m., Cheolsoo Park wrote:
> > src/org/apache/pig/impl/util/AvroTupleWrapper.java, line 163
> > <https://reviews.apache.org/r/8104/diff/1/?file=191572#file191572line163>
> >
> >     Can you instead use log.debug(..., e)?

Just added the exception to the line logging line above
- Joseph
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8104/#review13962
-----------------------------------------------------------
On Nov. 17, 2012, 5:28 a.m., Joseph Adler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8104/
> -----------------------------------------------------------
>
> (Updated Nov. 17, 2012, 5:28 a.m.)
>
>
> Review request for pig and Cheolsoo Park.
>
>
> Description
> -------
>
> The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.)
>
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni.
>
> This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing).
>
>
> This addresses bug PIG-3015.
>     https://issues.apache.org/jira/browse/PIG-3015
>
>
> Diffs
> -----
>
>   build.xml 7d468a0
>   ivy.xml 70e8d50
>   ivy/libraries.properties 317564f
>   src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION