Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # dev >> Re: Review Request: Changes for PIG-3321


Copy link to this message
-
Re: Review Request: Changes for PIG-3321


> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 92
> > <https://reviews.apache.org/r/11155/diff/1/?file=291819#file291819line92>
> >
> >     userSpecifiedInputAvroSchema - just to make reading code easier.

Can you make it userSpecifiedAvroSchema as it is common for both load and store?
> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java, line 550
> > <https://reviews.apache.org/r/11155/diff/1/?file=291819#file291819line550>
> >
> >     Can we do this for same and schema_uri too?
>
> Harvey Chong wrote:
>     Do you mean that for the 'same' and 'schema_file' options, the specified schema should be populated in userSpecifiedInputAvroSchema?  I do not see a 'schema_uri' option.

Its in the trunk. You are looking at branch-0.11
> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroDatumReader.java, line 69
> > <https://reviews.apache.org/r/11155/diff/1/?file=291821#file291821line69>
> >
> >     Why is this temporary array required when data is already being read in in.readFieldOrder() ? Am I missing something?
>
> Harvey Chong wrote:
>     The data is being read in the order it was written by the writer, based on the writer schema.  We want to reorder the fields to be in the order specified by the reader schema.

Got it. Instead of creating a temporary array there, can you create the tuple with readOrderedFields.size and do tuple.set(fs.pos(),..) instead of append ?
> On May 14, 2013, 11:45 p.m., Rohini Palaniswamy wrote:
> > http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java, line 86
> > <https://reviews.apache.org/r/11155/diff/1/?file=291823#file291823line86>
> >
> >     writerSchema = readerSchema
>
> Harvey Chong wrote:
>     I think it is a cleaner solution to leave this resolution to the avro libs.

Sounds good.
- Rohini
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11155/#review20547
-----------------------------------------------------------
On May 14, 2013, 10:18 p.m., Harvey Chong wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11155/
> -----------------------------------------------------------
>
> (Updated May 14, 2013, 10:18 p.m.)
>
>
> Review request for pig and Rohini Palaniswamy.
>
>
> Description
> -------
>
> Changes for https://issues.apache.org/jira/browse/PIG-3321
>
> Overview:
> AvroStorage.java - If 'schema' argument is passed to constructor, use it as the reader schema on load.  Moved getSchema() to AvroStorageUtils and made public+static, so it can be called from PigAvroRecordReader.
> AvroStorageUtils.java - Moved getSchema() here.
> PigAvroInputFormat.java - nothing functional here, just renamed 'schema' to 'readerSchema' for clarity.
> PigAvroRecordReader.java - The constructor now determines the writer schema for its split, and passes both reader and writer schema to the PigAvroDatumReader constructor, which will allow the Avro code to resolve the two.
> PigAvroDatumReader.java - Changed readRecord() to add entries to the output Tuple in writer order rather than reader order.
> TestAvroStorage.java - Added a new testcase for user specified schema in load.
>
>
> Diffs
> -----
>
>   http://svn.apache.org/repos/asf/pig/branches/branch-0.11/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java 1482017