Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> aliases, doc and non-standard tags

Copy link to this message
aliases, doc and non-standard tags

I have a complex schema consisting of primitives, multiple records, and
multiple array of records fields.That's all great and everything works
fine when I process these avro data files via hadoop.

My question regards the "doc" and "aliases" tags. Basically I want a way to
mark some fields with a flag that will allow my MapReduce code to only
process a subset of the fields.

For example, some fields contain static info about a device (name, ipaddr,
etc) so my code would just sync the values to a DB. Other fields like
cpu,mem,load util would be flagged differently and hadoop would do some
calculations on the values of those fields.

Any suggestions on how to use the schema to encode this meta info?
I tried these 3 methods:

1. I tried to use the doc field like this
  "fields": [
    {"name": "time_stamp", "doc": "info", "type": "long"},
    {"name": "hostname", "doc": "info", "type": "string"},

but using this code:
  List<String> fields = new ArrayList<String>();
  List<Schema.Field> fieldList = schema.getFields();
  for (Field field : fieldList) {
  String fName = field.name();
  String fDoc = field.schema().getDoc();
  System.out.printf("Name: [%s]\n\tDoc : %s",fName, fDoc);
 I get "null"s.
    Name: [time_stamp]
    Doc : null
    Name: [hostname]
    Doc : null

2. I tried to use the "aliases" tag with this schema
    "aliases": ["MyRec","info.hostname.hostid","calc.cpuutil.memutil"],
and this code
    Set<String> aliases = schema.getAliases();
  Iterator<String>  aIter = aliases.iterator();
  while (aIter.hasNext()) {
  String alias = aIter.next();
  System.out.printf("ALIAS: %s",alias);
which sort of works but it's limited because the schema parser only allows
letters, numbers, and dots in the alias string
    ALIAS: com.company.app.MyRecord
    ALIAS: info.hostname.hostid
    ALIAS: calc.cpuutil.memutil

3. I also noticed I can add an "non-standard" field to the schema like
  "info": "hostname,hostid",
and the ant schema compiler task adds it to the schema.

I can then retrieve the "non-standard" field's value with:
  String info = schema.getJsonProp("info").getTextValue();
  System.out.printf("INFO FIELD: %s",info);
which prints:
    INFO FIELD: hostname,hostid

Alan Miller