Kafka, mail # user - Re: Persist Queue On HDFS - 2014-01-15, 05:07
Solr & Elasticsearch trainings in New York & San Fransisco [more info][hide]
 Search Hadoop and all its subprojects:

Switch to Threaded View
Copy link to this message
-
Re: Persist Queue On HDFS
Nice idea, but different sort of animal.  Going to HDFS is different.  It requires aggregation of traffic, so there is the whole offset commit strategy concern.  When pulling traffic for per message work, we commit after every pull, so exactly once.  The tradeoff with aggregation is whether to allow for at-lease once, or have some traffic loss under extreme conditions.  We chose the later since we felt it occurs less often and rarely.  So we still commit after every message and if the rack falls over, we lose a couple of hundred (thousand ?) messages.  We could always replay manually, from the last successful offset, as we have that info available, somewhere, whereas duplication requires pruning.  Plus it seems to avoid the shutdown fetcher syndrome.  A concurrent writer to hdfs is helpful, so there is lower latency, just split the traffic into queues.  We go for the second time to prod in 3 days.

Is there anyway to use the high level consumer and read chunks of traffic through the KafkaStream, at a time?

Thank you,
Robert
On Jan 14, 2014, at 2:16 PM, Blender Bl <[EMAIL PROTECTED]> wrote:

- rob

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB