Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa, mail # dev - Creating a new adaptor: FileTailingAdaptor that would not cut lines


Copy link to this message
-
Re: Creating a new adaptor: FileTailingAdaptor that would not cut lines
Eric Yang 2013-04-22, 04:25
maxReadSize can be increased in the configuration.  If using larger
maxReadSize is preferred, we can update the default to be larger size.

regards,
Eric

On Sun, Apr 21, 2013 at 3:07 PM, Luangsay Sourygna <[EMAIL PROTECTED]>wrote:

> As I said before, I don't think Chukwa should handle those situations since
> I think this is a "log rotation" problem.
> Personally, I have never seen such problem (log4j RFA for instance has a
> kind of "flexible" size and every rotated file ended with a \n).
>
> On the other side, there is a special situation I think Chukwa should take
> care of.
> Default value for configuration
> "chukwaAgent.fileTailingAdaptor.maxReadSize" is 128kB, which means that if
> a line/record is bigger than that size, the record won't be sent by the
> agent.
> We'll get a warning in the Chukwa's log, but the record will be lost (see
> LWFTAdaptor.slurp() method).
> In such case, would it be possible to temporally increase MAX_READ_SIZE so
> that we are able to send
> one record on the wire?
>
> Regards,
>
> Sourygna
>
>
>
>
> On Sun, Apr 21, 2013 at 7:05 PM, Eric Yang <[EMAIL PROTECTED]> wrote:
>
> > Do we need to consider rotation base on size?  For example the last line
> of
> > the log file that reaches 300MB.  There is no line break in the first
> file,
> > but the entry continue to the next rotated log then have a line feed
> > delimiter.  If we are splitting line base on \n, then we can reconstruct
> > the full line between two files. I am not sure if this case need to be
> > supported?
> >
> > regards,
> > Eric
> >
> >
> > On Fri, Apr 19, 2013 at 12:01 PM, Luangsay Sourygna <[EMAIL PROTECTED]
> > >wrote:
> >
> > > Well, log4j socket adaptor may be great if you control the software
> that
> > > generates logs.
> > > That is not usually my case: customers don't really like having to
> > install
> > > a Chukwa agents
> > > on their production servers so I don't want to think about telling them
> > to
> > > change the log system
> > > of their software.
> > >
> > > As for partial line when log files rotate, I don't think this is
> > something
> > > Chukwa should manage (what
> > > is more: how could Chukwa be aware there is a problem?).
> > > To my view, this would be an error of the "logrotate" system. As far
> as I
> > > know, RFA and DRFA log4j
> > > appenders handle quite well the rotation.
> > >
> > > Regards,
> > >
> > > Sourygna
> > >
> > >
> > > On Fri, Apr 19, 2013 at 8:17 AM, Eric Yang <[EMAIL PROTECTED]> wrote:
> > >
> > > > I think the best solution is to use Log4j socket appender and Chukwa
> > > log4j
> > > > socket adaptor to get the full entry of the log without worry about
> > line
> > > > feed.  However, this solution only works with program that is written
> > in
> > > > Java, and does not keep a copy of existing log file on disk.
> > > >
> > > > I think your proposal is a good idea to solve tailing text file and
> > only
> > > > line delimited entry will be send.  How do we handle partial line and
> > log
> > > > file has rotated?
> > > >
> > > > regards,
> > > > Eric
> > > >
> > > > On Thu, Apr 18, 2013 at 11:33 AM, Luangsay Sourygna <
> > [EMAIL PROTECTED]
> > > > >wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > FileTailingAdaptor is great to tail log files and send them to
> > Hadoop.
> > > > > However, last line of the chunk is usually cut which leads to some
> > > > errors.
> > > > >
> > > > > I know that we can use CharFileTailingAdaptorUTF8 to solve such
> > > problem.
> > > > > Nonetheless, this adaptor calls the MapProcessor.process() method
> for
> > > > every
> > > > > line in each chunk, thus slowing a lot the Demux phase.
> > > > >
> > > > > I suggest creating a new adaptor that would mix the benefits of the
> > two
> > > > > adaptors: the (Demux) speed of FileTailingAdaptor and
> > > > > the preservation of lines from CharFileTailingAdaptorUTF8.
> > > > >
> > > > > The implementation of the extractRecords() would be:
> > > > > - "for loop" on the buffer, starting from the end of the buffer and