Hi, so have a case where we have

data set 1 with schema and a field - { "name": "x", "type" : "string" }
we have app1 and it does .get("x") generic retrieval
This application becomes long lived and we don't want (maybe can't) change it.

We want to change the name of the field. Lets say our new field name is "y" ... according to docs/specs we are supposed to add that to aliases... A new producer can create data referencing the improved name “y” and an old consumer can go on thinking in terms of a “x” without having to do any work.

The problem is the world changes and really the context of that field name should be "y" and not "x". We want to-do this because the context of the schema should make sense and context for current state is important. e.g. we used to call it "horse_drawn_carriage" and now we want to call it "automobile" (pda->mobile_device (lots of things change over time in context) ... there are lots of real world examples that I don't/can't want to get into the weeds about hopefully my two random ones are enough to help illustrate the problem is real...  we also have cases where over time the name will likely change again so if we kept using the current approach and add more to aliases you don't know which one of those aliases is really the current one which is why we favor field name to be current context.

so we do

data set 2 with schema and a field - { "name": "y", "type" : "string", "aliases" :["x"]}
we have app2 and it does .get("y") generic retrieval because that is how folks now know to build their apps. The problem is.... aliases are not bidirectional. So we can't reference "x" to get at our data in the old app which breaks :(

So we came up with a patch that handles this ~ roughly ~

public static Object resolveField(GenericRecord genericRecord, String fieldName) {
        for (Schema.Field field : genericRecord.getSchema().getFields()) {
            if (field.name().equals(fieldName)) { return genericRecord.get(fieldName); }

            for (String alias : field.aliases()) {
                if (fieldName.equals(alias)) { return genericRecord.get(field.name()); }

        return null;

I wanted to check first if we were missing something as we were going through this or doing something by changing alias in a way that the community believes is at odds with some principles we were not understanding or properly grocking? I am very open minded that we have gone down the wrong path here however it does seem to solve the core problem we have with keeping context of the schema current. I could see how this problem is not just us or our use case and one that others have too.

If folks are in sync with this change I would like to propose/create a patch and see about making aliases work bi-directionally allowing folks to use the name field as "the current context of the name of the thing" where the list of aliases are historic items.



~ Joe Stein
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB