Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Has anyone developed a utility to tell what is missing from a record?

Copy link to this message
Re: Has anyone developed a utility to tell what is missing from a record?
AVRO-1284 [0] has the first patch in improving the schema validation so
that schema polymorphism handles validation. Philip and Jonathan, a review
would be nice. Upvotes would be nice too, but constructive feedback is
probably even better.

I think there's a further generalization that allows schema objects
(schemata?) to recursively callback to a a generic data-and-schema
parallel-walker ("validate" would be one of those, but so could be

That might be a bit tricky to build without breaking backwards
compatibility to older Python uses, but ... maybe not.  Older methods
should be implementable in these terms.

I'll file a separate issue to provide this refactoring into a base walker


[0] https://issues.apache.org/jira/browse/AVRO-1284
[1] https://issues.apache.org/jira/browse/AVRO-1265
On Thu, Apr 4, 2013 at 9:16 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> Ok, cool. I've been using the python implementation pretty heavily and
> didn't realize that it was less mature. Will definitely work on maturing it
> where possible :)
> 2013/4/4 Philip Zeyliger <[EMAIL PROTECTED]>
>> Hi Jonathan,
>> The python implementation is definitely less mature than the Java one.
>>  As you run into things, please do file bugs (and, better, yet, fixes!).
>> At one point someone on this list was working on an alternative python
>> implementation that generated python objects to represent the Avro records.
>>  I think that's a wise idea (and is what Thrift does).  Not sure where
>> that's gone.
>> -- Philip
>> On Thu, Apr 4, 2013 at 9:02 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>>> I'm also running into issues where the Python and Java implementations
>>> are different (it seems like Java is less permissive than Python). Are
>>> these cases bugs? It can be frustrating for something to work in one but
>>> not the other.
>>> Having the info from the parallel recursion would allow us to have much
>>> better error messages. That would be great...
>>> 2013/4/4 Jeremy Kahn <[EMAIL PROTECTED]>
>>>> I think this would be tremendously useful.
>>>> I am working - in my copious spare time - on improving schema
>>>> validation in the Python library, and I think I can see how to improve
>>>> things there by extending the data/schema parallel recursion to keep track
>>>> of position in each.
>>>> Jeremy
>>>> On Apr 4, 2013 6:58 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
>>>>> I'm working on migrating an internally developed serialization format
>>>>> to Avro. In the process, there have been many cases where I made a mistake
>>>>> migrating the schema (I've automated it), and then avro cries that a record
>>>>> I'm trying to serialize doesn't match the schema. Generally, the error it
>>>>> gives doesn't help find the actual issue, and for a big enough record
>>>>> finding the issue can be tedious.
>>>>> I've thought about making a tool which, given the schema and the
>>>>> record would tell you what the issue is, but I'm wondering if this already
>>>>> exists? I suppose the error message could also include this information...
>>>>> Thanks
>>>>> Jon