Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - about flume-ng agent


Copy link to this message
-
Re: about flume-ng agent
Brock Noland 2012-10-25, 16:40
If your flume agent with the exec source is restarted, in the mean
time your application could have logged N amount of data. tail -F is
just going to send the last 10 lines of the current log file and then
any new data.

There are other scenarios as well, but the one above is the most
stressing. If you are just writing log4j records I would recommend the
log4j appender or since it's a java app, the RPCClient.

Brock

On Thu, Oct 25, 2012 at 11:35 AM, iain wright <[EMAIL PROTECTED]> wrote:
> Hi brock,
>
> Can you please expand on the tail -F loosing large ammounts of data?
>
> Would ireprocessing log files ensure with reasonable certainty that all data
> made it into hbase?
>
> We are about to put flume into prod for writing transactions to hbase, I
> must have missed the bit where tail -F is prone to data loss in the docs.
>
> Our source app is Java, we were just writing to a file with log4j.
>
> Thank you and have have great day,
>
> Iain wright
>
> On Oct 25, 2012 9:22 AM, "Brock Noland" <[EMAIL PROTECTED]> wrote:
>>
>> If you cannot use RPCclient (project is not in java), then writing the
>> events to syslog and then sending those events to a "collector" agent
>> running syslog source is probably the best option. A worse option
>> would be to use exec source with tail -F. This is "worse" because it
>> can easily lose large amounts of data.
>>
>> Brock
>>
>> On Thu, Oct 25, 2012 at 11:00 AM, lancexxx <[EMAIL PROTECTED]>
>> wrote:
>> > oh, seemingly ,I see. sorry , I am new to flume.
>> > now I collect log from web server and want to use syslogudp source,
>> > which tool or  RPCclient  I should use to sent the data to the source of
>> > flume-ng agent
>> > on web server host ? maybe can you recommend to me a better source type
>> > like
>> > AVRO source,
>> > syslog source etc. because I do not realized the difference or advantage
>> > between them and
>> > I find no more information via the official guide。
>> > thanks very much!
>> > --
>> > lancexxx
>> >
>> > On 2012年10月25日Thursday at 下午10:37, Brock Noland wrote:
>> >
>> > Either the webserver must run a flume agent, the webserver must use
>> > the RPCClient (just a java object, not an agent) or the webserver can
>> > use the log4j appender.
>> >
>> > Brock
>> >
>> > On Wed, Oct 24, 2012 at 10:51 PM, lancexxx <[EMAIL PROTECTED]>
>> > wrote:
>> >
>> >
>> > hi
>> > I do not understand that every host of webserser must run a flume-ng
>> > agent
>> > if I collect weblog?
>> > if no ,well then the client(web server host) how to sent the log to the
>> > flume-ng agent host in the internet?
>> > --
>> > thanks!
>> > lancexxx
>> >
>> >
>> >
>> >
>> > --
>> > Apache MRUnit - Unit testing MapReduce -
>> > http://incubator.apache.org/mrunit/
>> >
>> >
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/

--
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/