Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa >> mail # user >> Data loss on collector side


+
Jaydeep Ayachit 2010-11-02, 16:48
+
Jerome Boulon 2010-11-02, 16:53
+
Jaydeep Ayachit 2010-11-03, 07:10
+
Ariel Rabkin 2010-11-05, 19:43
+
Jaydeep Ayachit 2010-10-28, 14:58
Copy link to this message
-
Re: Data loss on collector side
Yes. the Agent will resend. The checkpoint state will not be advanced
until an 200 is received from a collector.

Yes, the demux processing is intended to remove duplicates; if it
doesn't, that's a bug.
On Thu, Oct 28, 2010 at 7:58 AM, Jaydeep Ayachit
<[EMAIL PROTECTED]> wrote:
> As per the collector design, the collector accepts multiple chunks and
> writes each chunk to hdfs. If all the chunks are written to hdfs, collector
> sends back 200 status to agent
>
> If hdfs write fails in between, the collector aborts entire processing and
> sends exception. This could mean that the data is partially written to hdfs.
> I have a couple of questions
>
>
>
> 1.       The agent does not receive response 200. Does it resend the same
> data to another collector? How does checkpointing works in this case?
>
> 2.       If the agent sends same data to another collector and it goes to
> hdfs, there is a duplication of some records. Are those duplicates filtered
> when preprocessor runs?
>
>
>
> In summary what data loss happens when hdfs goes down from collector
> perspective?
>
>
>
> Thanks,
>
> Jaydeep
>
>
>
> Jaydeep Ayachit | Persistent Systems Ltd
>
> Cell: +91 9822393963 | Desk: +91 712 3986747
>
>
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
+
Eric Yang 2010-10-28, 17:02
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB