Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa, mail # dev - Creating a new adaptor: FileTailingAdaptor that would not cut lines


Copy link to this message
-
Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines
Luangsay Sourygna 2013-04-21, 22:07
As I said before, I don't think Chukwa should handle those situations since
I think this is a "log rotation" problem.
Personally, I have never seen such problem (log4j RFA for instance has a
kind of "flexible" size and every rotated file ended with a \n).

On the other side, there is a special situation I think Chukwa should take
care of.
Default value for configuration
"chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that if
a line/record is bigger than that size, the record won't be sent by the
agent.
We'll get a warning in the Chukwa's log, but the record will be lost (see
LWFTAdaptor.slurp() method).
In such case, would it be possible to temporally increase MAX_READ_SIZE so
that we are able to send
one record on the wire?

Regards,

Sourygna
On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <[EMAIL PROTECTED]> wrote:

> Do we need to consider rotation base on size?  For example the last line of
> the log file that reaches 300MB.  There is no line break in the first file,
> but the entry continue to the next rotated log then have a line feed
> delimiter.  If we are splitting line base on \n, then we can reconstruct
> the full line between two files. I am not sure if this case need to be
> supported?
>
> regards,
> Eric
>
>
> On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna <[EMAIL PROTECTED]
> >wrote:
>
> > Well, log4j socket adaptor may be great if you control the software that
> > generates logs.
> > That is not usually my case: customers don't really like having to
> install
> > a Chukwa agents
> > on their production servers so I don't want to think about telling them
> to
> > change the log system
> > of their software.
> >
> > As for partial line when log files rotate, I don't think this is
> something
> > Chukwa should manage (what
> > is more: how could Chukwa be aware there is a problem?).
> > To my view, this would be an error of the "logrotate" system. As far as I
> > know, RFA and DRFA log4j
> > appenders handle quite well the rotation.
> >
> > Regards,
> >
> > Sourygna
> >
> >
> > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[EMAIL PROTECTED]> wrote:
> >
> > > I think the best solution is to use Log4j socket appender and Chukwa
> > log4j
> > > socket adaptor to get the full entry of the log without worry about
> line
> > > feed.  However, this solution only works with program that is written
> in
> > > Java, and does not keep a copy of existing log file on disk.
> > >
> > > I think your proposal is a good idea to solve tailing text file and
> only
> > > line delimited entry will be send.  How do we handle partial line and
> log
> > > file has rotated?
> > >
> > > regards,
> > > Eric
> > >
> > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna <
> [EMAIL PROTECTED]
> > > >wrote:
> > >
> > > > Hi all,
> > > >
> > > > FileTailingAdaptor is great to tail log files and send them to
> Hadoop.
> > > > However, last line of the chunk is usually cut which leads to some
> > > errors.
> > > >
> > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such
> > problem.
> > > > Nonetheless, this adaptor calls the MapProcessor.process() method for
> > > every
> > > > line in each chunk, thus slowing a lot the Demux phase.
> > > >
> > > > I suggest creating a new adaptor that would mix the benefits of the
> two
> > > > adaptors: the (Demux) speed of FileTailingAdaptor and
> > > > the preservation of lines from CharFileTailingAdaptorUTF8.
> > > >
> > > > The implementation of the extractRecords() would be:
> > > > - "for loop" on the buffer, starting from the end of the buffer and
> > going
> > > > backward
> > > > - if we find a separator, save the offset and exit the loop
> > > > - rest of method would be similar to CharFileTailingAdaptorUTF8.
> > > >
> > > > Could you guys please tell me what do you think about it?
> > > > How do you currently manage the "lines cut" with Chukwa?
> > > >
> > > > Regards,
> > > >
> > > > Sourygna
> > > >
> > >
> >
>