Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # dev - Re: Review Request: Changes for PIG-3321


Copy link to this message
-
Re: Review Request: Changes for PIG-3321
Harvey Chong 2013-05-15, 15:44


> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 550
> > <https://reviews.apache.org/r/11155/diff/1/?file=291819#file291819line550>
> >
> >     Can we do this for same and schema_uri too?

Do you mean that for the 'same' and 'schema_file' options, the specified schema should be populated in userSpecifiedInputAvroSchema?  I do not see a 'schema_uri' option.
> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroDatumReader.java, line 69
> > <https://reviews.apache.org/r/11155/diff/1/?file=291821#file291821line69>
> >
> >     Why is this temporary array required when data is already being read in in.readFieldOrder() ? Am I missing something?

The data is being read in the order it was written by the writer, based on the writer schema.  We want to reorder the fields to be in the order specified by the reader schema.
> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroInputFormat.java, line 59
> > <https://reviews.apache.org/r/11155/diff/1/?file=291822#file291822line59>
> >
> >     Shouldn't it be input data schema? Or we can call it reader schema only.
> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java, line 86
> > <https://reviews.apache.org/r/11155/diff/1/?file=291823#file291823line86>
> >
> >     writerSchema = readerSchema

I think it is a cleaner solution to leave this resolution to the avro libs.  
- Harvey
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11155/#review20547
-----------------------------------------------------------
On May 14, 2013, 10:18 p.m., Harvey Chong wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11155/
> -----------------------------------------------------------
>
> (Updated May 14, 2013, 10:18 p.m.)
>
>
> Review request for pig and Rohini Palaniswamy.
>
>
> Description
> -------
>
> Changes for https://issues.apache.org/jira/browse/PIG-3321
>
> Overview:
> AvroStorage.java - If 'schema' argument is passed to constructor, use it as the reader schema on load.  Moved getSchema() to AvroStorageUtils and made public+static, so it can be called from PigAvroRecordReader.
> AvroStorageUtils.java - Moved getSchema() here.
> PigAvroInputFormat.java - nothing functional here, just renamed 'schema' to 'readerSchema' for clarity.
> PigAvroRecordReader.java - The constructor now determines the writer schema for its split, and passes both reader and writer schema to the PigAvroDatumReader constructor, which will allow the Avro code to resolve the two.
> PigAvroDatumReader.java - Changed readRecord() to add entries to the output Tuple in writer order rather than reader order.
> TestAvroStorage.java - Added a new testcase for user specified schema in load.
>
>
> Diffs
> -----
>
>   http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1482017
>   http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorageUtils.java 1482017
>   http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroDatumReader.java 1482017