Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> question about completely untagged data...


Copy link to this message
-
Re: question about completely untagged data...
If your schemas are next to your data and part of the same storage system,
aren't you also similarly worried about protecting your data against loss
and corruption?

I'm not sure why one would be separate from the other in terms of backups,
disaster prevention or recovery?

And you may well want to look at just adding a separate backend (if needed)
to HAvroBase ... it sounds like it is already most of the way towards what
you want.

 - Bruce

On Mon, Nov 29, 2010 at 11:50 AM, David Jeske <[EMAIL PROTECTED]> wrote:

> On Sun, Nov 28, 2010 at 8:44 PM, Bruce Mitchener <
> [EMAIL PROTECTED]> wrote:
>
>> To be clear, HAvroBase stores tuples of (schema ID, data) and then looks
>> up the schema from that ID.  It doesn't store each schema separately /
>> entirely alongside the corresponding data records / entries.
>
>
> Ahh, yes, that's analagous to what I'm planning to do as well. The
> Schema-ID points to a directory of user-supplied schemas. However, it's
> important for me to have a contingency plan in case somehow, someday there
> is ever corruption that disconnected the schema-ID from the actual schema.
>
> I think putting a packed-binary format of the field-type-info into each
> record would give me what I want with a space-usage that's proportional to
> Thrift overall. It also seems like the kind of thing that could (possibly)
> one-day be a supported mechanism of Avro without actually changing the
> existing binary format. Best of all worlds.
>
> As a bonus, there are situations where the schemas i'll be using are so
> unchanging and common (i.e. embedded in code) that there really isn't any
> fear of them being lost. In these cases it's nice that Avro can be used to
> pack and unpack things without any field-type overhead.
>
> Thanks for the comments.
>
>
>