Bart Verwilst 2012-11-24, 13:23
On a quick look pass, this looks sane, and nests many-to-one sets of data
within the parent record.
A few things to think about:
* A double in mySQL will be a double in Avro, not a long.
* Each field that can be null in the database should be a union of null
and the field type. For example, if a schema was
CREATE TABLE "Foo" (
id int primary_key,
product_id int not null,
then the record would need three fields -- the first two are integers that
are not nullable, and the last one is a string that may be null:
On 11/24/12 5:23 AM, "Bart Verwilst" <[EMAIL PROTECTED]> wrote:
>I'm currently writing an importer to import our MySQL data into hadoop
>( as Avro files ). Attached you can find the schema i'm converting to
>Avro, along with the corresponding Avro schema i would like to use for
>my imported data. I was wondering if you guys could go over the schema
>and determine if this is sane/optimal, and if not, how i should improve
>As a sidenote, i converted bigints to long, and had 1 occurrence of
>double, which i also converted to long in the avro, not sure if that's
>the correct type?
>Thanks in advance for your expert opinions! ;)