Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # dev - Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables


Copy link to this message
-
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
Jakob Homan 2013-07-12, 22:32

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review23102
-----------------------------------------------------------
Overall, looks good.  I'm concerned that there's no end-to-end test of having some non-Avro data in Hive, using Hive to write (and join it) to an Avro file, and verifying the content of the actual Avro file.  Would something like that be feasible?
ql/src/test/queries/clientpositive/avro_create_as_select.q
<https://reviews.apache.org/r/11925/#comment46910>

    This test doesn't actually work on a non-avro table.  It just works on the definition of a non-avro table.  We should have another one that does a select from a populated, non-avro-backed table and verify the values are converted corrected.

ql/src/test/queries/clientpositive/avro_without_schema.q
<https://reviews.apache.org/r/11925/#comment46911>

    I don't see a test that exercises the generated schema through whole mapreduce query, just ones that exercise the metadata store...

serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46912>

    Should be a debug

serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46913>

    spacing

serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46916>

    We lose any column comments that may have been on the original schema.  Does that normally happen with CTAS in Hive?

serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46921>

    Rather than building the avro schema string by hand and then generating the schema object, why not generate the avro schema and then run toString on it?  This guarantees the schema will be correctly formed and type checked.

serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46919>

    Use j here instead, so as not to shadow the i.

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46922>

    Some spacing on the table would help.

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46924>

    formatting

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46925>

    fot -> for.  type info -> TypeInfo

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46942>

    Is there any way to turn off the union wrapping?  If not, does it require its own separate parameter?

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46927>

    formatting

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46923>

    Move this above the first function so it's clear to readers.

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46933>

    Yeah, do that.

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46929>

    Doesn't this duplicate generateSchema above?

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46930>

    Hive TypeInfo allows for non-string keys, while Avro does not.  There should be a check here for that.

serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46931>

    can this be private?

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46934>

    nit: can this be made more readable?

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46936>

    Rather than hand-coding these, can we have Hive generate them?  This will catch any changes if Hive changes how it generates them later on.

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46938>

    Please separate out all the test cases into individual tests.

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
<https://reviews.apache.org/r/11925/#comment46939>

    assert on the content of the exception, or subtype it and use the expects annotation.

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46943>

    Either add msg string or break out into separate tests.

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46944>

    delete

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46946>

    delete

serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java
<https://reviews.apache.org/r/11925/#comment46947>

    odd name...
- Jakob Homan
On July 12, 2013, 11:49 a.m., Mohammad Islam wrote: