|
lancexxx
2012-10-25, 03:51
Brock Noland
2012-10-25, 14:37
lancexxx
2012-10-25, 16:00
Brock Noland
2012-10-25, 16:21
Paul Chavez
2012-10-25, 16:54
Brock Noland
2012-10-25, 17:25
Paul Chavez
2012-10-25, 18:16
Brock Noland
2012-10-25, 18:47
Will McQueen
2012-10-25, 20:38
Paul Chavez
2012-10-25, 20:59
lancexxx
2012-10-26, 02:17
Hari Shreedharan
2012-10-26, 02:44
lancexxx
2012-10-26, 02:56
Hari Shreedharan
2012-10-26, 03:12
lancexxx
2012-10-26, 03:18
iain wright
2012-10-25, 16:35
Brock Noland
2012-10-25, 16:40
|
-
about flume-ng agentlancexxx 2012-10-25, 03:51
hi I do not understand that every host of webserser must run a flume-ng agent if I collect weblog? if no ,well then the client(web server host) how to sent the log to the flume-ng agent host in the internet? -- thanks! lancexxx +
lancexxx 2012-10-25, 03:51
-
Re: about flume-ng agentBrock Noland 2012-10-25, 14:37
Either the webserver must run a flume agent, the webserver must use
the RPCClient (just a java object, not an agent) or the webserver can use the log4j appender. Brock On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED]> wrote: > > hi > I do not understand that every host of webserser must run a flume-ng agent > if I collect weblog? > if no ,well then the client(web server host) how to sent the log to the > flume-ng agent host in the internet? > -- > thanks! > lancexxx > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2012-10-25, 14:37
-
Re: about flume-ng agentlancexxx 2012-10-25, 16:00
oh, seemingly ,I see. sorry , I am new to flume.
now I collect log from web server and want to use syslogudp source, which tool or RPCclient I should use to sent the data to the source of flume-ng agent on web server host ? maybe can you recommend to me a better source type like AVRO source, syslog source etc. because I do not realized the difference or advantage between them and I find no more information via the official guide。 thanks very much! -- lancexxx On 2012年10月25日Thursday at 下午10:37, Brock Noland wrote: > Either the webserver must run a flume agent, the webserver must use > the RPCClient (just a java object, not an agent) or the webserver can > use the log4j appender. > > Brock > > On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > hi > > I do not understand that every host of webserser must run a flume-ng agent > > if I collect weblog? > > if no ,well then the client(web server host) how to sent the log to the > > flume-ng agent host in the internet? > > -- > > thanks! > > lancexxx > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > +
lancexxx 2012-10-25, 16:00
-
Re: about flume-ng agentBrock Noland 2012-10-25, 16:21
If you cannot use RPCclient (project is not in java), then writing the
events to syslog and then sending those events to a "collector" agent running syslog source is probably the best option. A worse option would be to use exec source with tail -F. This is "worse" because it can easily lose large amounts of data. Brock On Thu, Oct 25, 2012 at 11:00 AM, lancexxx <[EMAIL PROTECTED]> wrote: > oh, seemingly ,I see. sorry , I am new to flume. > now I collect log from web server and want to use syslogudp source, > which tool or RPCclient I should use to sent the data to the source of > flume-ng agent > on web server host ? maybe can you recommend to me a better source type like > AVRO source, > syslog source etc. because I do not realized the difference or advantage > between them and > I find no more information via the official guide。 > thanks very much! > -- > lancexxx > > On 2012年10月25日Thursday at 下午10:37, Brock Noland wrote: > > Either the webserver must run a flume agent, the webserver must use > the RPCClient (just a java object, not an agent) or the webserver can > use the log4j appender. > > Brock > > On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED]> wrote: > > > hi > I do not understand that every host of webserser must run a flume-ng agent > if I collect weblog? > if no ,well then the client(web server host) how to sent the log to the > flume-ng agent host in the internet? > -- > thanks! > lancexxx > > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2012-10-25, 16:21
-
RE: about flume-ng agentPaul Chavez 2012-10-25, 16:54
I have tried to go the syslogUDP route to get log files from a Windows server to a flume agent, and did not find it an adequate solution.
- We are seeing corruped events when sending IIS logs (known issue: https://issues.apache.org/jira/browse/FLUME-1365) - Our data is too large to fit in a 1500 byte ethernet frame so events are fragmented and the syslogUDP source ignores the continuation packets, resulting in truncated events. I have just been able to build the flume-ng agent for Windows and am testing the avro-client functionality on Windows. I think this will be the best bet for us in the short term, using LogParser to incrementally create files via scheduled task for the avro client to send along. Long term we want to develop a .net Avro client library for our apps to use directly. I suppose a log4net avro appender would be nice too. -----Original Message----- From: Brock Noland [mailto:[EMAIL PROTECTED]] Sent: Thursday, October 25, 2012 9:22 AM To: [EMAIL PROTECTED] Subject: Re: about flume-ng agent If you cannot use RPCclient (project is not in java), then writing the events to syslog and then sending those events to a "collector" agent running syslog source is probably the best option. A worse option would be to use exec source with tail -F. This is "worse" because it can easily lose large amounts of data. Brock On Thu, Oct 25, 2012 at 11:00 AM, lancexxx <[EMAIL PROTECTED]> wrote: > oh, seemingly ,I see. sorry , I am new to flume. > now I collect log from web server and want to use syslogudp source, > which tool or RPCclient I should use to sent the data to the source > of flume-ng agent on web server host ? maybe can you recommend to me a > better source type like AVRO source, syslog source etc. because I do > not realized the difference or advantage between them and I find no > more information via the official guide。 > thanks very much! > -- > lancexxx > > On 2012年10月25日Thursday at 下午10:37, Brock Noland wrote: > > Either the webserver must run a flume agent, the webserver must use > the RPCClient (just a java object, not an agent) or the webserver can > use the log4j appender. > > Brock > > On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED]> wrote: > > > hi > I do not understand that every host of webserser must run a flume-ng > agent if I collect weblog? > if no ,well then the client(web server host) how to sent the log to > the flume-ng agent host in the internet? > -- > thanks! > lancexxx > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Paul Chavez 2012-10-25, 16:54
-
Re: about flume-ng agentBrock Noland 2012-10-25, 17:25
Hi,
Questions inline: On Thu, Oct 25, 2012 at 11:54 AM, Paul Chavez <[EMAIL PROTECTED]> wrote: > I have tried to go the syslogUDP route to get log files from a Windows server to a flume agent, and did not find it an adequate solution. > > - We are seeing corruped events when sending IIS logs (known issue: https://issues.apache.org/jira/browse/FLUME-1365) > - Our data is too large to fit in a 1500 byte ethernet frame so events are fragmented and the syslogUDP source ignores the continuation packets, resulting in truncated events. I am not a Syslog expert, is there a way to detect the continuation packets? If so I think we should open a JIRA on this issue. > I have just been able to build the flume-ng agent for Windows and am testing the avro-client functionality on Windows. I think this will be the best bet for us in the short term, using LogParser to incrementally create files via scheduled task for the avro client to send along. > > Long term we want to develop a .net Avro client library for our apps to use directly. I suppose a log4net avro appender would be nice too. I think a .NET version of the RPCClient and log4net appender would be awesome. If you do write them, would you considering pushing them back to the flume community? Brock +
Brock Noland 2012-10-25, 17:25
-
RE: about flume-ng agentPaul Chavez 2012-10-25, 18:16
Answers inline:
>On Thu, Oct 25, 2012 at 11:54 AM, Paul Chavez <[EMAIL PROTECTED]> wrote: >> I have tried to go the syslogUDP route to get log files from a Windows server to a flume agent, and did not find it an adequate solution. >> >> - We are seeing corruped events when sending IIS logs (known issue: >> https://issues.apache.org/jira/browse/FLUME-1365) >> - Our data is too large to fit in a 1500 byte ethernet frame so events are fragmented and the syslogUDP source ignores the continuation packets, resulting in truncated events. > >I am not a Syslog expert, is there a way to detect the continuation packets? If so I think we should open a JIRA on this issue. I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. >From an IP perspective, there are Ipv4 header flags that indicate if more fragments follow along with a fragment identifier to help reassemble out-of-order packets so it certainly is possible even if it goes against RFC recommendations. Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. >> I have just been able to build the flume-ng agent for Windows and am testing the avro-client functionality on Windows. I think this will be the best bet for us in the short term, using LogParser to incrementally create files via scheduled task for the avro client to send along. >> >> Long term we want to develop a .net Avro client library for our apps to use directly. I suppose a log4net avro appender would be nice too. > >I think a .NET version of the RPCClient and log4net appender would be awesome. If you do write them, would you considering pushing them back to the flume community? I would hope so but I am not in a position to make any guarantees on behalf of my employer. +
Paul Chavez 2012-10-25, 18:16
-
Re: about flume-ng agentBrock Noland 2012-10-25, 18:47
On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez
<[EMAIL PROTECTED]> wrote: > > I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. Ahh OK. > > Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. Yes TCP/IP will deliver the events in ordert. > I would hope so but I am not in a position to make any guarantees on behalf of my employer. No worries, I understand! Brock -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2012-10-25, 18:47
-
Re: about flume-ng agentWill McQueen 2012-10-25, 20:38
Would the new HttpSource work for you?
On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED]> wrote: > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > <[EMAIL PROTECTED]> wrote: >> >> I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > Ahh OK. > >> >> Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > Yes TCP/IP will deliver the events in ordert. > >> I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > No worries, I understand! > > Brock > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Will McQueen 2012-10-25, 20:38
-
RE: about flume-ng agentPaul Chavez 2012-10-25, 20:59
I was unaware of an HTTPSource, after reviewing the FLUME-1199 issue it may well be the best use case for us.
That said, while I am manually building and testing the latest snapshot for the Windows side, our actual Hadoop machines are running CDH4.1.1 which has flume-ng 1.2.0. When a version of flume-ng containing HTTPSource is packaged along with the rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do not manage the hadoop systems. Thank you, Paul Chavez -----Original Message----- From: Will McQueen [mailto:[EMAIL PROTECTED]] Would the new HttpSource work for you? On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED]> wrote: > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > <[EMAIL PROTECTED]> wrote: >> >> I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > Ahh OK. > >> >> Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > Yes TCP/IP will deliver the events in ordert. > >> I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > No worries, I understand! > > Brock > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ +
Paul Chavez 2012-10-25, 20:59
-
Re: about flume-ng agentlancexxx 2012-10-26, 02:17
the HTTPSource now can use with flume-ng 1.3.0?
-- lancexxx Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On 2012年10月26日Friday at 上午4:59, Paul Chavez wrote: > I was unaware of an HTTPSource, after reviewing the FLUME-1199 issue it may well be the best use case for us. > > That said, while I am manually building and testing the latest snapshot for the Windows side, our actual Hadoop machines are running CDH4.1.1 which has flume-ng 1.2.0. > > When a version of flume-ng containing HTTPSource is packaged along with the rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do not manage the hadoop systems. > > Thank you, > Paul Chavez > > -----Original Message----- > From: Will McQueen [mailto:[EMAIL PROTECTED]] > > Would the new HttpSource work for you? > > On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > > > Ahh OK. > > > > > > > > Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > > > Yes TCP/IP will deliver the events in ordert. > > > > > I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > > > No worries, I understand! > > > > Brock > > > > -- > > Apache MRUnit - Unit testing MapReduce - > > http://incubator.apache.org/mrunit/ > > > > > +
lancexxx 2012-10-26, 02:17
-
Re: about flume-ng agentHari Shreedharan 2012-10-26, 02:44
The HTTPSource will be a part of Apache Flume-1.3.0.
Thanks, Hari -- Hari Shreedharan On Thursday, October 25, 2012 at 4:59 PM, Paul Chavez wrote: > I was unaware of an HTTPSource, after reviewing the FLUME-1199 issue it may well be the best use case for us. > > That said, while I am manually building and testing the latest snapshot for the Windows side, our actual Hadoop machines are running CDH4.1.1 which has flume-ng 1.2.0. > > When a version of flume-ng containing HTTPSource is packaged along with the rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do not manage the hadoop systems. > > Thank you, > Paul Chavez > > -----Original Message----- > From: Will McQueen [mailto:[EMAIL PROTECTED]] > > Would the new HttpSource work for you? > > On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > > > Ahh OK. > > > > > > > > Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > > > Yes TCP/IP will deliver the events in ordert. > > > > > I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > > > No worries, I understand! > > > > Brock > > > > -- > > Apache MRUnit - Unit testing MapReduce - > > http://incubator.apache.org/mrunit/ > > > > > +
Hari Shreedharan 2012-10-26, 02:44
-
Re: about flume-ng agentlancexxx 2012-10-26, 02:56
will? the mean that now can not be use ?
-- lancexxx On 2012年10月26日Friday at 上午10:44, Hari Shreedharan wrote: > The HTTPSource will be a part of Apache Flume-1.3.0. > > Thanks, > Hari > > -- > Hari Shreedharan > > > On Thursday, October 25, 2012 at 4:59 PM, Paul Chavez wrote: > > > I was unaware of an HTTPSource, after reviewing the FLUME-1199 issue it may well be the best use case for us. > > > > That said, while I am manually building and testing the latest snapshot for the Windows side, our actual Hadoop machines are running CDH4.1.1 which has flume-ng 1.2.0. > > > > When a version of flume-ng containing HTTPSource is packaged along with the rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do not manage the hadoop systems. > > > > Thank you, > > Paul Chavez > > > > -----Original Message----- > > From: Will McQueen [mailto:[EMAIL PROTECTED]] > > > > Would the new HttpSource work for you? > > > > On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > > > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > > > I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > > > > > Ahh OK. > > > > > > > > > > > Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > > > > > Yes TCP/IP will deliver the events in ordert. > > > > > > > I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > > > > > No worries, I understand! > > > > > > Brock > > > > > > -- > > > Apache MRUnit - Unit testing MapReduce - > > > http://incubator.apache.org/mrunit/ > > > > > > > > > > > > > +
lancexxx 2012-10-26, 02:56
-
Re: about flume-ng agentHari Shreedharan 2012-10-26, 03:12
Well, you can use if it you clone trunk and build it. It is just not yet in a release.
Hari -- Hari Shreedharan On Thursday, October 25, 2012 at 10:56 PM, lancexxx wrote: > will? the mean that now can not be use ? > -- > lancexxx > > > On 2012年10月26日Friday at 上午10:44, Hari Shreedharan wrote: > > > The HTTPSource will be a part of Apache Flume-1.3.0. > > > > Thanks, > > Hari > > > > -- > > Hari Shreedharan > > > > > > On Thursday, October 25, 2012 at 4:59 PM, Paul Chavez wrote: > > > > > I was unaware of an HTTPSource, after reviewing the FLUME-1199 issue it may well be the best use case for us. > > > > > > That said, while I am manually building and testing the latest snapshot for the Windows side, our actual Hadoop machines are running CDH4.1.1 which has flume-ng 1.2.0. > > > > > > When a version of flume-ng containing HTTPSource is packaged along with the rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do not manage the hadoop systems. > > > > > > Thank you, > > > Paul Chavez > > > > > > -----Original Message----- > > > From: Will McQueen [mailto:[EMAIL PROTECTED]] > > > > > > Would the new HttpSource work for you? > > > > > > On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > > > > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > > > > > I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > > > > > > > Ahh OK. > > > > > > > > > > > > > > Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > > > > > > > Yes TCP/IP will deliver the events in ordert. > > > > > > > > > I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > > > > > > > No worries, I understand! > > > > > > > > Brock > > > > > > > > -- > > > > Apache MRUnit - Unit testing MapReduce - > > > > http://incubator.apache.org/mrunit/ > > > > > > > > > > > > > > > > > > > > > +
Hari Shreedharan 2012-10-26, 03:12
-
Re: about flume-ng agentlancexxx 2012-10-26, 03:18
ok,I will waiting for a release because even I want use now there have no a config sample.
good luck with you . -- lancexxx Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On 2012年10月26日Friday at 上午11:12, Hari Shreedharan wrote: > Well, you can use if it you clone trunk and build it. It is just not yet in a release. > > Hari > > -- > Hari Shreedharan > > > On Thursday, October 25, 2012 at 10:56 PM, lancexxx wrote: > > > will? the mean that now can not be use ? > > -- > > lancexxx > > > > > > On 2012年10月26日Friday at 上午10:44, Hari Shreedharan wrote: > > > > > The HTTPSource will be a part of Apache Flume-1.3.0. > > > > > > Thanks, > > > Hari > > > > > > -- > > > Hari Shreedharan > > > > > > > > > On Thursday, October 25, 2012 at 4:59 PM, Paul Chavez wrote: > > > > > > > I was unaware of an HTTPSource, after reviewing the FLUME-1199 issue it may well be the best use case for us. > > > > > > > > That said, while I am manually building and testing the latest snapshot for the Windows side, our actual Hadoop machines are running CDH4.1.1 which has flume-ng 1.2.0. > > > > > > > > When a version of flume-ng containing HTTPSource is packaged along with the rest of the hadoop distribution I will look at it. As a 'windoze guy' ;-) I do not manage the hadoop systems. > > > > > > > > Thank you, > > > > Paul Chavez > > > > > > > > -----Original Message----- > > > > From: Will McQueen [mailto:[EMAIL PROTECTED]] > > > > > > > > Would the new HttpSource work for you? > > > > > > > > On Oct 25, 2012, at 2:47 PM, Brock Noland <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > > > > On Thu, Oct 25, 2012 at 1:16 PM, Paul Chavez > > > > > <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > > > > > > > > I am no expert either but the various syslog related RFC and RFC-type documentation I can find recommends that messeages be kept small in order to avoid fragmentation. > > > > > > > > > > Ahh OK. > > > > > > > > > > > > > > > > > Testing with the syslogTCP source did not show any issues with fragmentation, but the tool we are using to send syslog messages over TCP (LogParser) does not separate messages with a carriage return so messages weren't parsed correctly by the flume source. > > > > > > > > > > Yes TCP/IP will deliver the events in ordert. > > > > > > > > > > > I would hope so but I am not in a position to make any guarantees on behalf of my employer. > > > > > > > > > > No worries, I understand! > > > > > > > > > > Brock > > > > > > > > > > -- > > > > > Apache MRUnit - Unit testing MapReduce - > > > > > http://incubator.apache.org/mrunit/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +
lancexxx 2012-10-26, 03:18
-
Re: about flume-ng agentiain wright 2012-10-25, 16:35
Hi brock,
Can you please expand on the tail -F loosing large ammounts of data? Would ireprocessing log files ensure with reasonable certainty that all data made it into hbase? We are about to put flume into prod for writing transactions to hbase, I must have missed the bit where tail -F is prone to data loss in the docs. Our source app is Java, we were just writing to a file with log4j. Thank you and have have great day, Iain wright On Oct 25, 2012 9:22 AM, "Brock Noland" <[EMAIL PROTECTED]> wrote: > If you cannot use RPCclient (project is not in java), then writing the > events to syslog and then sending those events to a "collector" agent > running syslog source is probably the best option. A worse option > would be to use exec source with tail -F. This is "worse" because it > can easily lose large amounts of data. > > Brock > > On Thu, Oct 25, 2012 at 11:00 AM, lancexxx <[EMAIL PROTECTED]> > wrote: > > oh, seemingly ,I see. sorry , I am new to flume. > > now I collect log from web server and want to use syslogudp source, > > which tool or RPCclient I should use to sent the data to the source of > > flume-ng agent > > on web server host ? maybe can you recommend to me a better source type > like > > AVRO source, > > syslog source etc. because I do not realized the difference or advantage > > between them and > > I find no more information via the official guide。 > > thanks very much! > > -- > > lancexxx > > > > On 2012年10月25日Thursday at 下午10:37, Brock Noland wrote: > > > > Either the webserver must run a flume agent, the webserver must use > > the RPCClient (just a java object, not an agent) or the webserver can > > use the log4j appender. > > > > Brock > > > > On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED]> > wrote: > > > > > > hi > > I do not understand that every host of webserser must run a flume-ng > agent > > if I collect weblog? > > if no ,well then the client(web server host) how to sent the log to the > > flume-ng agent host in the internet? > > -- > > thanks! > > lancexxx > > > > > > > > > > -- > > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > > > > > > > > -- > Apache MRUnit - Unit testing MapReduce - > http://incubator.apache.org/mrunit/ > +
iain wright 2012-10-25, 16:35
-
Re: about flume-ng agentBrock Noland 2012-10-25, 16:40
If your flume agent with the exec source is restarted, in the mean
time your application could have logged N amount of data. tail -F is just going to send the last 10 lines of the current log file and then any new data. There are other scenarios as well, but the one above is the most stressing. If you are just writing log4j records I would recommend the log4j appender or since it's a java app, the RPCClient. Brock On Thu, Oct 25, 2012 at 11:35 AM, iain wright <[EMAIL PROTECTED]> wrote: > Hi brock, > > Can you please expand on the tail -F loosing large ammounts of data? > > Would ireprocessing log files ensure with reasonable certainty that all data > made it into hbase? > > We are about to put flume into prod for writing transactions to hbase, I > must have missed the bit where tail -F is prone to data loss in the docs. > > Our source app is Java, we were just writing to a file with log4j. > > Thank you and have have great day, > > Iain wright > > On Oct 25, 2012 9:22 AM, "Brock Noland" <[EMAIL PROTECTED]> wrote: >> >> If you cannot use RPCclient (project is not in java), then writing the >> events to syslog and then sending those events to a "collector" agent >> running syslog source is probably the best option. A worse option >> would be to use exec source with tail -F. This is "worse" because it >> can easily lose large amounts of data. >> >> Brock >> >> On Thu, Oct 25, 2012 at 11:00 AM, lancexxx <[EMAIL PROTECTED]> >> wrote: >> > oh, seemingly ,I see. sorry , I am new to flume. >> > now I collect log from web server and want to use syslogudp source, >> > which tool or RPCclient I should use to sent the data to the source of >> > flume-ng agent >> > on web server host ? maybe can you recommend to me a better source type >> > like >> > AVRO source, >> > syslog source etc. because I do not realized the difference or advantage >> > between them and >> > I find no more information via the official guide。 >> > thanks very much! >> > -- >> > lancexxx >> > >> > On 2012年10月25日Thursday at 下午10:37, Brock Noland wrote: >> > >> > Either the webserver must run a flume agent, the webserver must use >> > the RPCClient (just a java object, not an agent) or the webserver can >> > use the log4j appender. >> > >> > Brock >> > >> > On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED]> >> > wrote: >> > >> > >> > hi >> > I do not understand that every host of webserser must run a flume-ng >> > agent >> > if I collect weblog? >> > if no ,well then the client(web server host) how to sent the log to the >> > flume-ng agent host in the internet? >> > -- >> > thanks! >> > lancexxx >> > >> > >> > >> > >> > -- >> > Apache MRUnit - Unit testing MapReduce - >> > http://incubator.apache.org/mrunit/ >> > >> > >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - >> http://incubator.apache.org/mrunit/ -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/ +
Brock Noland 2012-10-25, 16:40
|