|
Jaydeep Ayachit
2010-10-28, 14:58
Ariel Rabkin
2010-10-28, 16:51
Eric Yang
2010-10-28, 17:02
Jaydeep Ayachit
2010-11-02, 16:48
Jerome Boulon
2010-11-02, 16:53
Jaydeep Ayachit
2010-11-03, 07:10
Ariel Rabkin
2010-11-05, 19:43
|
-
Data loss on collector sideJaydeep Ayachit 2010-10-28, 14:58
As per the collector design, the collector accepts multiple chunks and writes each chunk to hdfs. If all the chunks are written to hdfs, collector sends back 200 status to agent
If hdfs write fails in between, the collector aborts entire processing and sends exception. This could mean that the data is partially written to hdfs. I have a couple of questions 1. The agent does not receive response 200. Does it resend the same data to another collector? How does checkpointing works in this case? 2. If the agent sends same data to another collector and it goes to hdfs, there is a duplication of some records. Are those duplicates filtered when preprocessor runs? In summary what data loss happens when hdfs goes down from collector perspective? Thanks, Jaydeep Jaydeep Ayachit | Persistent Systems Ltd Cell: +91 9822393963 | Desk: +91 712 3986747 DISCLAIMER =========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
-
Re: Data loss on collector sideAriel Rabkin 2010-10-28, 16:51
Yes. the Agent will resend. The checkpoint state will not be advanced
until an 200 is received from a collector. Yes, the demux processing is intended to remove duplicates; if it doesn't, that's a bug. On Thu, Oct 28, 2010 at 7:58 AM, Jaydeep Ayachit <[EMAIL PROTECTED]> wrote: > As per the collector design, the collector accepts multiple chunks and > writes each chunk to hdfs. If all the chunks are written to hdfs, collector > sends back 200 status to agent > > If hdfs write fails in between, the collector aborts entire processing and > sends exception. This could mean that the data is partially written to hdfs. > I have a couple of questions > > > > 1. The agent does not receive response 200. Does it resend the same > data to another collector? How does checkpointing works in this case? > > 2. If the agent sends same data to another collector and it goes to > hdfs, there is a duplication of some records. Are those duplicates filtered > when preprocessor runs? > > > > In summary what data loss happens when hdfs goes down from collector > perspective? > > > > Thanks, > > Jaydeep > > > > Jaydeep Ayachit | Persistent Systems Ltd > > Cell: +91 9822393963 | Desk: +91 712 3986747 > > > > DISCLAIMER ========== This e-mail may contain privileged and confidential > information which is the property of Persistent Systems Ltd. It is intended > only for the use of the individual or entity to which it is addressed. If > you are not the intended recipient, you are not authorized to read, retain, > copy, print, distribute or use this message. If you have received this > communication in error, please notify the sender and delete all copies of > this message. Persistent Systems Ltd. does not accept any liability for > virus infected mails. -- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department
-
Re: Data loss on collector sideEric Yang 2010-10-28, 17:02
On 10/28/10 7:58 AM, "Jaydeep Ayachit" <[EMAIL PROTECTED]> wrote: > As per the collector design, the collector accepts multiple chunks and writes > each chunk to hdfs. If all the chunks are written to hdfs, collector sends > back 200 status to agent > If hdfs write fails in between, the collector aborts entire processing and > sends exception. This could mean that the data is partially written to hdfs. I > have a couple of questions > > 1. The agent does not receive response 200. Does it resend the same data > to another collector? How does checkpointing works in this case? > Agent check for response HTTP 200, if it doesn't receive OK status, it will send to another collector from it's list. Checkpoint is updated after HTTP 200 status is received. > 2. If the agent sends same data to another collector and it goes to hdfs, > there is a duplication of some records. Are those duplicates filtered when > preprocessor runs? It is possible to build a preprocessor filter to remove duplicate data for small time window. However, it doesn't guarantee to remove 100% of duplicates because duplicated data can be received in different batch of the Archive/Demux process. I recommend to remove duplicates when data is being indexed where the down stream program like hbase or mysql has view of all the data. > In summary what data loss happens when hdfs goes down from collector > perspective? When HDFS goes down, then collector exits. Hence, it is possible to lose up to 15 second data if the last flush to HDFS did not store data to datanode. In this case, collector will not send HTTP code 200 to agent, and data is resent by the agent. There is also a localWriter which writes data locally on collector node, then upload to HDFS. This assumes collector local disk is more reliable than HDFS. I don't think this is a common scenario. Regards, Eric > > Thanks, > Jaydeep > > Jaydeep Ayachit | Persistent Systems Ltd > Cell: +91 9822393963 | Desk: +91 712 3986747 > > DISCLAIMER ========== This e-mail may contain privileged and confidential > information which is the property of Persistent Systems Ltd. It is intended > only for the use of the individual or entity to which it is addressed. If you > are not the intended recipient, you are not authorized to read, retain, copy, > print, distribute or use this message. If you have received this communication > in error, please notify the sender and delete all copies of this message. > Persistent Systems Ltd. does not accept any liability for virus infected > mails. >
-
Data loss on collector sideJaydeep Ayachit 2010-11-02, 16:48
Hello,
When collector is in middle of committing chunks to sequence file, if hdfs becomes unavailable, the collector bails out. What happens to .chukwa file the collector is working on? This file will not be renamed to .done file. The file as such would become orphan as next time collector starts, it will not carry on with this file. Is there any process, that looks for orphan files and takes action on them? (like rename to .done) Thanks, Jaydeep DISCLAIMER =========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
-
Re: Data loss on collector sideJerome Boulon 2010-11-02, 16:53
Hi,
The HDFS writer is not doing that but the LocalWriter (HDFSMover) is exactly doing this. /Jerome. On 11/2/10 9:48 AM, "Jaydeep Ayachit" <[EMAIL PROTECTED]> wrote: Hello, When collector is in middle of committing chunks to sequence file, if hdfs becomes unavailable, the collector bails out. What happens to .chukwa file the collector is working on? This file will not be renamed to .done file. The file as such would become orphan as next time collector starts, it will not carry on with this file. Is there any process, that looks for orphan files and takes action on them? (like rename to .done) Thanks, Jaydeep DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
-
RE: Data loss on collector sideJaydeep Ayachit 2010-11-03, 07:10
Hello,
Could you please provide more information on this? Regards Jaydeep From: Jerome Boulon [mailto:[EMAIL PROTECTED]] Sent: Tuesday, November 02, 2010 10:23 PM To: [EMAIL PROTECTED] Subject: Re: Data loss on collector side Hi, The HDFS writer is not doing that but the LocalWriter (HDFSMover) is exactly doing this. /Jerome. On 11/2/10 9:48 AM, "Jaydeep Ayachit" <[EMAIL PROTECTED]> wrote: Hello, When collector is in middle of committing chunks to sequence file, if hdfs becomes unavailable, the collector bails out. What happens to .chukwa file the collector is working on? This file will not be renamed to .done file. The file as such would become orphan as next time collector starts, it will not carry on with this file. Is there any process, that looks for orphan files and takes action on them? (like rename to .done) Thanks, Jaydeep DISCLAIMER ========== This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. DISCLAIMER =========This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
-
Re: Data loss on collector sideAriel Rabkin 2010-11-05, 19:43
https://issues.apache.org/jira/browse/CHUKWA-4 has the design
discussion and code. --Ari On Wed, Nov 3, 2010 at 12:10 AM, Jaydeep Ayachit <[EMAIL PROTECTED]> wrote: > Hello, > > > > Could you please provide more information on this? > > > > Regards > > Jaydeep > > > > -- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department |