Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Has anyone developed a utility to tell what is missing from a record?


Copy link to this message
-
Re: Has anyone developed a utility to tell what is missing from a record?
AVRO-1284 [0] has the first patch in improving the schema validation so
that schema polymorphism handles validation. Philip and Jonathan, a review
would be nice. Upvotes would be nice too, but constructive feedback is
probably even better.

I think there's a further generalization that allows schema objects
(schemata?) to recursively callback to a a generic data-and-schema
parallel-walker ("validate" would be one of those, but so could be
"default-filler"[1]).

That might be a bit tricky to build without breaking backwards
compatibility to older Python uses, but ... maybe not.  Older methods
should be implementable in these terms.

I'll file a separate issue to provide this refactoring into a base walker
class.

--Jeremy

[0] https://issues.apache.org/jira/browse/AVRO-1284
[1] https://issues.apache.org/jira/browse/AVRO-1265
On Thu, Apr 4, 2013 at 9:16 AM, Jonathan Coveney <[EMAIL PROTECTED]> wrote:

> Ok, cool. I've been using the python implementation pretty heavily and
> didn't realize that it was less mature. Will definitely work on maturing it
> where possible :)
>
>
> 2013/4/4 Philip Zeyliger <[EMAIL PROTECTED]>
>
>> Hi Jonathan,
>>
>> The python implementation is definitely less mature than the Java one.
>>  As you run into things, please do file bugs (and, better, yet, fixes!).
>>
>> At one point someone on this list was working on an alternative python
>> implementation that generated python objects to represent the Avro records.
>>  I think that's a wise idea (and is what Thrift does).  Not sure where
>> that's gone.
>>
>> -- Philip
>>
>>
>> On Thu, Apr 4, 2013 at 9:02 AM, Jonathan Coveney <[EMAIL PROTECTED]>wrote:
>>
>>> I'm also running into issues where the Python and Java implementations
>>> are different (it seems like Java is less permissive than Python). Are
>>> these cases bugs? It can be frustrating for something to work in one but
>>> not the other.
>>>
>>> Having the info from the parallel recursion would allow us to have much
>>> better error messages. That would be great...
>>>
>>>
>>> 2013/4/4 Jeremy Kahn <[EMAIL PROTECTED]>
>>>
>>>> I think this would be tremendously useful.
>>>>
>>>> I am working - in my copious spare time - on improving schema
>>>> validation in the Python library, and I think I can see how to improve
>>>> things there by extending the data/schema parallel recursion to keep track
>>>> of position in each.
>>>>
>>>> Jeremy
>>>> On Apr 4, 2013 6:58 AM, "Jonathan Coveney" <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> I'm working on migrating an internally developed serialization format
>>>>> to Avro. In the process, there have been many cases where I made a mistake
>>>>> migrating the schema (I've automated it), and then avro cries that a record
>>>>> I'm trying to serialize doesn't match the schema. Generally, the error it
>>>>> gives doesn't help find the actual issue, and for a big enough record
>>>>> finding the issue can be tedious.
>>>>>
>>>>> I've thought about making a tool which, given the schema and the
>>>>> record would tell you what the issue is, but I'm wondering if this already
>>>>> exists? I suppose the error message could also include this information...
>>>>>
>>>>> Thanks
>>>>> Jon
>>>>>
>>>>
>>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB