Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Avro >> mail # user >> question about completely untagged data...


+
David Jeske 2010-11-29, 02:39
+
Philip Zeyliger 2010-11-29, 04:09
+
David Jeske 2010-11-29, 04:40
Copy link to this message
-
Re: question about completely untagged data...
To be clear, HAvroBase stores tuples of (schema ID, data) and then looks up
the schema from that ID.  It doesn't store each schema separately / entirely
alongside the corresponding data records / entries.

HAvroBase is really pretty nice and has backends for storing data into
things other than HBase...

 - Bruce

On Mon, Nov 29, 2010 at 11:09 AM, Philip Zeyliger <[EMAIL PROTECTED]>wrote:

> Hi David,
>
> Your assessment of Thrift and Avro being isomorphic is correct, and
> you've correctly identified the major philosophical difference.  (It's
> in fact a little bit deeper than you suggest: at read time, there are
> always two schemas available: the reader's schema and the original
> schema that the data was written with.)
>
> Where are you storing the Avro records?  Avro's binary format for
> records is unlikely to change: it's pretty stable and changing would
> be a big deal.  On the other hand, Avro already has multiple ways for
> passing schema information along.  Avro's RPC implementations do one
> thing.  Avro Data File store the schema in the header.  You could, in
> your system, always store (schema, data) tuples.  That's what Sam is
> doing in HAvroBase
> (
> http://www.javarants.com/2010/06/30/havrobase-a-searchable-evolvable-entity-store-on-top-of-hbase-and-solr/
> ).
>
> -- Philip
>
> On Sun, Nov 28, 2010 at 6:39 PM, David Jeske <[EMAIL PROTECTED]> wrote:
> > I have a storage project considering adding Thrift or Avro to for record
> > packing, and I have a couple questions.
> > Other than than type-id and field-ids, Avro and Thrift's designs seem
> > isomorphic. Is the binary format not including field-type-info something
> > that's set in stone, or something that's open for feedback?
> > I prefer the philosophy of Avro, namely to expect schemas to be
> available,
> > use those schemas for compatibility mapping, and to support dynamic
> schema
> > parsing in any supported language. In fact, being able to parse schemas
> > dynamically in any language is the real draw of Avro for me. (personally
> I'd
> > prefer if they were actually Avro IDL, instead of JSON, but I understand
> > that might complicate implementing client stubs).
> > However, the fact that data is not tagged with any type-information is
> > unacceptable dangerous for my application. There will be mechanisms for
> > mapping records to schemas, and schemas will be stored, but if a schema
> were
> > ever lost or corrupted, I can't afford for the data to turn into total
> junk.
> > Unless data is trivially small, encoding a field type wouldn't change the
> > size of the encoding much, but would provide some 'sanity checking' in
> > addition to be able to recover the raw data even if a schema was lost or
> the
> > schema ID for a piece of data was corrupted.
> > Since Avro is relatively new, I'm asking to find out if this is anathama
> to
> > the entire concept of Avro, or something something that was chosen, but
> > might be reconsidered eventually.
> > Going the thrift route for me will mean injecting a bit of the Avro
> > philosophy into Thrift, namely, adding a Thrift IDL parser to the
> language I
> > need, so I can save Thrift IDLs and then dynamically read them. However,
> > doing this as a one-off for my language different than having a supported
> > mechanism for all client languages -- like in Avro.
> >
> >
>
+
David Jeske 2010-11-29, 04:50
+
Bruce Mitchener 2010-11-29, 05:12
+
David Jeske 2010-11-29, 06:26
+
Doug Cutting 2010-11-29, 18:25
+
David Jeske 2010-11-29, 19:04
+
Doug Cutting 2010-11-29, 19:37
+
David Jeske 2010-11-29, 20:16
+
Tatu Saloranta 2010-11-29, 18:04
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB