Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa, mail # dev - Creating a new adaptor: FileTailingAdaptor that would not cut lines


Copy link to this message
-
Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines
Luangsay Sourygna 2013-04-19, 19:01
Well, log4j socket adaptor may be great if you control the software that
generates logs.
That is not usually my case: customers don't really like having to install
a Chukwa agents
on their production servers so I don't want to think about telling them to
change the log system
of their software.

As for partial line when log files rotate, I don't think this is something
Chukwa should manage (what
is more: how could Chukwa be aware there is a problem?).
To my view, this would be an error of the "logrotate" system. As far as I
know, RFA and DRFA log4j
appenders handle quite well the rotation.

Regards,

Sourygna
On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[EMAIL PROTECTED]> wrote:

> I think the best solution is to use Log4j socket appender and Chukwa log4j
> socket adaptor to get the full entry of the log without worry about line
> feed.  However, this solution only works with program that is written in
> Java, and does not keep a copy of existing log file on disk.
>
> I think your proposal is a good idea to solve tailing text file and only
> line delimited entry will be send.  How do we handle partial line and log
> file has rotated?
>
> regards,
> Eric
>
> On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna <[EMAIL PROTECTED]
> >wrote:
>
> > Hi all,
> >
> > FileTailingAdaptor is great to tail log files and send them to Hadoop.
> > However, last line of the chunk is usually cut which leads to some
> errors.
> >
> > I know that we can use CharFileTailingAdaptorUTF8 to solve such problem.
> > Nonetheless, this adaptor calls the MapProcessor.process() method for
> every
> > line in each chunk, thus slowing a lot the Demux phase.
> >
> > I suggest creating a new adaptor that would mix the benefits of the two
> > adaptors: the (Demux) speed of FileTailingAdaptor and
> > the preservation of lines from CharFileTailingAdaptorUTF8.
> >
> > The implementation of the extractRecords() would be:
> > - "for loop" on the buffer, starting from the end of the buffer and going
> > backward
> > - if we find a separator, save the offset and exit the loop
> > - rest of method would be similar to CharFileTailingAdaptorUTF8.
> >
> > Could you guys please tell me what do you think about it?
> > How do you currently manage the "lines cut" with Chukwa?
> >
> > Regards,
> >
> > Sourygna
> >
>