|
|
-
aliases, doc and non-standard tagsAlan Miller 2013-01-10, 14:00
Hi,
I have a complex schema consisting of primitives, multiple records, and multiple array of records fields.That's all great and everything works fine when I process these avro data files via hadoop. My question regards the "doc" and "aliases" tags. Basically I want a way to mark some fields with a flag that will allow my MapReduce code to only process a subset of the fields. For example, some fields contain static info about a device (name, ipaddr, etc) so my code would just sync the values to a DB. Other fields like cpu,mem,load util would be flagged differently and hadoop would do some calculations on the values of those fields. Any suggestions on how to use the schema to encode this meta info? I tried these 3 methods: 1. I tried to use the doc field like this ... "fields": [ {"name": "time_stamp", "doc": "info", "type": "long"}, {"name": "hostname", "doc": "info", "type": "string"}, but using this code: List<String> fields = new ArrayList<String>(); List<Schema.Field> fieldList = schema.getFields(); for (Field field : fieldList) { String fName = field.name(); String fDoc = field.schema().getDoc(); System.out.printf("Name: [%s]\n\tDoc : %s",fName, fDoc); I get "null"s. Name: [time_stamp] Doc : null Name: [hostname] Doc : null 2. I tried to use the "aliases" tag with this schema "aliases": ["MyRec","info.hostname.hostid","calc.cpuutil.memutil"], and this code Set<String> aliases = schema.getAliases(); Iterator<String> aIter = aliases.iterator(); while (aIter.hasNext()) { String alias = aIter.next(); System.out.printf("ALIAS: %s",alias); } which sort of works but it's limited because the schema parser only allows letters, numbers, and dots in the alias string ALIAS: com.company.app.MyRecord ALIAS: info.hostname.hostid ALIAS: calc.cpuutil.memutil 3. I also noticed I can add an "non-standard" field to the schema like "info": "hostname,hostid", and the ant schema compiler task adds it to the schema. I can then retrieve the "non-standard" field's value with: String info = schema.getJsonProp("info").getTextValue(); System.out.printf("INFO FIELD: %s",info); which prints: INFO FIELD: hostname,hostid Regards, Alan Miller |