Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # dev >> Creating a new adaptor: FileTailingAdaptor that would not cut lines


Copy link to this message
-
Creating a new adaptor: FileTailingAdaptor that would not cut lines
Hi all,

FileTailingAdaptor is great to tail log files and send them to Hadoop.
However, last line of the chunk is usually cut which leads to some errors.

I know that we can use CharFileTailingAdaptorUTF8 to solve such problem.
Nonetheless, this adaptor calls the MapProcessor.process() method for every
line in each chunk, thus slowing a lot the Demux phase.

I suggest creating a new adaptor that would mix the benefits of the two
adaptors: the (Demux) speed of FileTailingAdaptor and
the preservation of lines from CharFileTailingAdaptorUTF8.

The implementation of the extractRecords() would be:
- "for loop" on the buffer, starting from the end of the buffer and going
backward
- if we find a separator, save the offset and exit the loop
- rest of method would be similar to CharFileTailingAdaptorUTF8.

Could you guys please tell me what do you think about it?
How do you currently manage the "lines cut" with Chukwa?

Regards,

Sourygna