Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >>


Essentially we are instrumenting distributed applications. The instrumented
message format is defined in an Avro schema. The messages are transported
over a message queue (eg: RabbitMQ) or (eventually) over Flume and dumped
into HDFS from where they are loaded into Hive for querying.

In HDFS we can certainly colocate the data into a small number of files.
But I want to know if we can minimize the network bandwidth by generating
valid messages from the client-side but w/o the schema in the header.

Does that make sense?

Shaq
On Mon, Mar 17, 2014 at 4:17 PM, Sean Busbey <busbey+[EMAIL PROTECTED]>wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB