Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> question about completely untagged data...


Copy link to this message
-
Re: question about completely untagged data...
On Sun, Nov 28, 2010 at 6:39 PM, David Jeske <[EMAIL PROTECTED]> wrote:
> I have a storage project considering adding Thrift or Avro to for record
> packing, and I have a couple questions.
> Other than than type-id and field-ids, Avro and Thrift's designs seem
> isomorphic. Is the binary format not including field-type-info something
> that's set in stone, or something that's open for feedback?
...
> Going the thrift route for me will mean injecting a bit of the Avro
> philosophy into Thrift, namely, adding a Thrift IDL parser to the language I
> need, so I can save Thrift IDLs and then dynamically read them. However,
> doing this as a one-off for my language different than having a supported
> mechanism for all client languages -- like in Avro.

If you really want to keep bit more of descriptive information, you
could also just consider formats that do include property names, like
JSON (with compression).
Depending on exactly what you plan to store, it might be a competitive
choice all around.

I don't think either Avro or Thrift is actually aimed so much for
storing data as for transferring data; since the issue of persisting
schemas does complicate things significantly (same is true with
protobuf too, just even more so). And Avro specifically seems like
best fit for sequences of homogenous data entries (rows of DB, log
entries etc). This may or may not be similar to your use case.
But maybe there are other reasons why you have limited choice to just
these two formats?

-+ Tatu +-
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB