Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Chukwa, mail # user - Data loss on collector side


+
Jaydeep Ayachit 2010-11-02, 16:48
+
Jerome Boulon 2010-11-02, 16:53
+
Jaydeep Ayachit 2010-11-03, 07:10
+
Ariel Rabkin 2010-11-05, 19:43
+
Jaydeep Ayachit 2010-10-28, 14:58
Copy link to this message
-
Re: Data loss on collector side
Ariel Rabkin 2010-10-28, 16:51
Yes. the Agent will resend. The checkpoint state will not be advanced
until an 200 is received from a collector.

Yes, the demux processing is intended to remove duplicates; if it
doesn't, that's a bug.
On Thu, Oct 28, 2010 at 7:58 AM, Jaydeep Ayachit
<[EMAIL PROTECTED]> wrote:
> As per the collector design, the collector accepts multiple chunks and
> writes each chunk to hdfs. If all the chunks are written to hdfs, collector
> sends back 200 status to agent
>
> If hdfs write fails in between, the collector aborts entire processing and
> sends exception. This could mean that the data is partially written to hdfs.
> I have a couple of questions
>
>
>
> 1.       The agent does not receive response 200. Does it resend the same
> data to another collector? How does checkpointing works in this case?
>
> 2.       If the agent sends same data to another collector and it goes to
> hdfs, there is a duplication of some records. Are those duplicates filtered
> when preprocessor runs?
>
>
>
> In summary what data loss happens when hdfs goes down from collector
> perspective?
>
>
>
> Thanks,
>
> Jaydeep
>
>
>
> Jaydeep Ayachit | Persistent Systems Ltd
>
> Cell: +91 9822393963 | Desk: +91 712 3986747
>
>
>
> DISCLAIMER ========== This e-mail may contain privileged and confidential
> information which is the property of Persistent Systems Ltd. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Persistent Systems Ltd. does not accept any liability for
> virus infected mails.

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
+
Eric Yang 2010-10-28, 17:02