Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> consumer may lose data


Copy link to this message
-
consumer may lose data
Hi there this is my first experience w/kafka.  We've deployed it in production (soft release) and using it to create a realtime stream of data--we love it so far.

Running in production we are seeing these types of messages every once in a while:  

[2012-06-09 09:02:36,051] ERROR [pool-1-thread-4] (ConsumerIterator.scala 74) - consumed offset: 22013667532 doesn't match fetch offset: 21008498593 for firehose:1-23: fetched offset = 22013667532: consumed offset = 22013667532;
 Consumer may lose data
[2012-06-09 09:22:48,520] ERROR [pool-1-thread-4] (ConsumerIterator.scala 74) - consumed offset: 21013192930 doesn't match fetch offset: 21475567914 for firehose:1-3: fetched offset = 22021419503: consumed offset = 21013192930;
 Consumer may lose data
[2012-06-09 09:42:34,342] ERROR [pool-1-thread-1] (ConsumerIterator.scala 74) - consumed offset: 21017992042 doesn't match fetch offset: 21477363255 for firehose:1-5: fetched offset = 22029075985: consumed offset = 21017992042;
 Consumer may lose data
[2012-06-09 09:46:50,599] ERROR [pool-1-thread-1] (ConsumerIterator.scala 74) - consumed offset: 21017498912 doesn't match fetch offset: 21476883323 for firehose:1-7: fetched offset = 22022716494: consumed offset = 21017498912;
 Consumer may lose data
[2012-06-09 09:50:54,912] ERROR [pool-1-thread-1] (ConsumerIterator.scala 74) - consumed offset: 21016428723 doesn't match fetch offset: 21475750245 for firehose:1-4: fetched offset = 22027573299: consumed offset = 21016428723;
 Consumer may lose data
[2012-06-09 09:58:29,709] ERROR [pool-1-thread-1] (ConsumerIterator.scala 74) - consumed offset: 21017643906 doesn't match fetch offset: 21477006308 for firehose:1-6: fetched offset = 22025778964: consumed offset = 21017643906;
 Consumer may lose data
[2012-06-09 09:59:04,622] ERROR [pool-1-thread-4] (ConsumerIterator.scala 74) - consumed offset: 21017419393 doesn't match fetch offset: 21476749439 for firehose:1-23: fetched offset = 22025584690: consumed offset = 21017419393;

I am a bit unsure what kafka does when the consumed offset doesn't match the fetch offset.  We are using a pool of threads to consume each stream created by ConsumerConnector.createMessageStreams().  Is this kosher?