Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Review Request: Adding JSONRecordReader

Copy link to this message
Re: Review Request: Adding JSONRecordReader

This is an automatically generated e-mail. To reply, visit:
In general, can you give a quick explanation of your design?  I'm not entirely clear at how the various schemas relate to all the other data types and schemas available in the code base.  Clearly, some stuff should be specific.  Other stuff should probably be shared across various record readers.

    It is not clear to me why we need this.  Can't we use the MajorType/MinorType stuff?  Can you explain why we have this as well?




    You should never pass a dead byte buf into this method.


    Why are you overriding?  Isn't this exactly what the above method does?


    Does this method (here in super class) also set the value to not null?


    You shouldn't modify this block.  This block is for REQUIRED types.  For nullable, you should use the second block below that is commented out.  You can just uncomment the single NullableFixed4 value.


    Same as issue above.


    DataMode should OPTIONAL for Nullable fields.


    Why did you remove these?


    Ideally, we would maintain non-changing vectors across batches rather than recreating each time.  This is fine for now, though


    you need to add <scope>test</scope>
- Jacques Nadeau
On May 31, 2013, 11:47 p.m., Timothy Chen wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11587/
> -----------------------------------------------------------
> (Updated May 31, 2013, 11:47 p.m.)
> Review request for drill and Jacques Nadeau.
> Description
> -------
> Added the JSONRecordReader based on the previous ScanJson work.
>  Does not support nested fields, maps or lists yet.
>  Currently it detects to move on to the next batch when any of the field batch cannot hold another item for the current item being written. This also assumes the default batch size can always hold at least one item from any field (which only is a problem for variable length vectors).
> Diffs
> -----
>   sandbox/prototype/common/src/main/java/org/apache/drill/common/physical/schema/DiffSchema.java PRE-CREATION
>   sandbox/prototype/common/src/main/java/org/apache/drill/common/physical/schema/Field.java PRE-CREATION