On Tue, Mar 18, 2014 at 11:31 PM, hiteshpahuja <[EMAIL PROTECTED]>wrote:
Ah, yes. Currently the Avro specification only allows the type of a field
to be a named type or a schema. ATM, named types are only Record, Enum, and

That does mean that if one of the particular fields of your CommonData is
itself a named type you could reference it, but the usage is awkward.

Expanding named types to include record fields would be an incompatible
change, because it might cause existing schemas to break. Specifically, if
a schema had a field that had the same name as some other named type in the
same namespace the collision would result in an error. If this is something
you want to work out the details on, you should file a jira.

There are a few things you could do now, but the one I'd recommend is to
rely on alias support.

e.g. Given some example customized records

{"namespace": "recordData",
 "type": "record",
 "name": "CustomizedRecordDataFoo",
 "fields": [
     {"name": "recordId", "type": "string"},
     {"name":  "foo",  "type": ["string", "null"]}

{"namespace": "recordData",
 "type": "record",
 "name": "CustomizedRecordDataBar",
 "fields": [
     {"name": "bar", "type": "string"},
     {"name": "recordDate",  "type": "string"}

and then when you want to make use of common, you define a reader schema

{"namespace": "recordData",
  "type": "record",
  "name": "CommonData",
  "aliases": ["CustomizedRecordDataFoo", "CustomizedRecordDataBar"],
  "fields" : [
     {"name": "recordId", "type": ["null", "string"], "default": null},
     {"name": "recordDate",  "type": ["null", "string"], "default": null},
     {"name": "recordPrice", "type": ["null", "int"], "default": null},
     {"name": "customer", "type": ["null", "string"], "default": null}

Using that reader should allow you to go over records of both the
customized versions, with whichever fields are present being set.

Issues to consider in this approach

1) You have to make sure the schema of the individual fields resolve
according to spec rules[2]. The simplified version of this is to make sure
they're both string, int, or whatever (with the one in Common nullable).

2) If the field in the customized record is nullable, you won't be able to
tell the difference between the field not being present and being null. You
can mitigate this by using a known placeholder default instead.

If you can stand some storage overhead, you can deal with the first issue
by using the all-nullable CommonData record in all of the customized
records and then only setting those fields you actually want used.


[1]: http://avro.apache.org/docs/1.7.6/spec.html#Names
[2]: http://avro.apache.org/docs/1.7.6/spec.html#Schema+Resolution

NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB