Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - schema defaults not reflected in generated objects (1.3.2)


Copy link to this message
-
Re: schema defaults not reflected in generated objects (1.3.2)
Philip Zeyliger 2010-06-07, 22:53
On Mon, Jun 7, 2010 at 3:40 PM, Scott Carey <[EMAIL PROTECTED]> wrote:
>
> On Jun 7, 2010, at 3:30 PM, Doug Cutting wrote:
>
>> On 06/07/2010 03:11 PM, Bill de hOra wrote:
>>> This means writers can't leverage schema defaults, so writers should do
>>> something like this?
>>>
>>> Message message = new Message();
>>> // no defaults set
>>> String quux = message
>>> .getSchema()
>>> .getField("foo")
>>> .defaultValue()
>>> .getTextValue();
>>> message.foo=new Utf8(quux);
>>>
>>> [ignoring that the writer needs to know the schema type]. I suspect
>>> people will just write in garbage (like empty strings).
>>
>> No, we don't expect folks to do that.  If a writer never sets a field
>> then they might be better off dropping that field from their schema.  If
>> the writer only rarely sets it, then a schema which is a union with null
>> might be better, making the field optional.  But if the field is usually
>> set but it's awkward for the programmer to know whether its set, then
>> automatically filling in a default might be a useful feature and the
>> default from the schema is probably a good value to use.
>>
>> Like Philip, I too am +1 for enhancing the SpecificCompiler to set
>> default values from the schema in generated code.  The only downside I
>> see is perhaps a slight performance loss: if the default value is always
>> overwritten then the allocation and setting of it will still be executed
>> for each instance.
>
> For my specific use case, defaults being set all the time would hurt performance quite a bit.  (the schema not trivial -- ~5k in JSON)
>
> If the specific compiler generated a couple constructors --
> * A default empty argument constructor -- fills fields with defaults.
> * A constructor with all fields passed in -- assigns fields from the constructor and does nothing with defaults.

Unfortunately, the latter is bad for compatibility.  If there are two
fields of the same time, it's pretty easy (since there's not
necessarily a canonical order) for this to introduce a nasty
incompatibility.  This is not theoretical: Thrift ran smack into this.
 PBs have "Builder" objects, which I imagine don't satisfy your
performance worries.

If you have getters and setters, you can implement setFoo() with
"this.fieldsSet.setBit(FOO); this.foo = foo);" (i.e., use a bitmap to
remember what has and hasn't been set).  The current API doesn't use
setters, I think, though, so this wouldn't be easily backwards
compatible.