Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Chukwa >> mail # user >> the check point offset is bigger than the log file size


Copy link to this message
-
Re: the check point offset is bigger than the log file size
Rotation is a bit of a mess.

We've tried a couple strategies to handle it, none of which are perfect.
One approach is to have a modified logger that explicitly invokes
chukwa, starting and stopping adaptors.
The other is that the FileTailingAdaptors keep not only a physical
"how long is the file" offset, but a logical "what is the byte number
of the first byte of the file" -- the idea is that if the file
rotates, the adaptor should add the length of the rotated-out section
to the length of the current file.

This is a bit fragile, since the adaptor has to guess which was the
previously-rotated file. I believe we use timestamps for that. I
suspect it won't always work.

--Ari

On Tue, May 15, 2012 at 11:45 PM, IvyTang <[EMAIL PROTECTED]> wrote:
>     After reading the source code ,i'm confuesd about the checkpoint file .
>
>     The file tailer generate the chunks into the memlimitqueue, the
> httpsender get the chunks to send from the  memlimitqueue. And after the
> httpsender send the chunks to collector reliably ,the reportCommit(Adaptor
> src, long uuid) will be called.
>
>    In this reportCommit(Adaptor src, long uuid) method, the src is the
> adaptor , the uuid is the offset of those chunks which have beend in the
> file .And if the uuid is >  adaptor.offset , the means some chunks have been
> sent , so the adaptor.offset is assigned to the uuid.
>
>   This works file when the log file is  not rotating .
>
>     But if the log file is rotating(i mean the way like log4j , move this
> file to *.1 and generate a file named *), the  adaptor.offset is the offset
> of those chunks been sent in last file , it's of course very big . but uuid
> is the offset of chunks been sent of this file , the uuid is smaller the
> the adaptor.offset .
>
>     So the checkpoint file won't change .
>
>     Even though chukwa is still sending chunks to collector , but if chukwa
> restarted , the checkpoint is larger than the log file size , the log file
> will be sent again.
>
>
>
> On Mon, May 14, 2012 at 7:01 PM, IvyTang <[EMAIL PROTECTED]> wrote:
>>
>> The gamelog size is 158023223, but the check point file is
>>
>> ADD adaptor_2963225a90653a309cf779d4a1d815a3 >> org.apache.hadoop.chukwa.datacollection.adaptor.filetailer.CharFileTailingAdaptorUTF8
>> Gamelog 0 /var/log/dataproxy/gamelog 229406124
>>
>> The gamelog didn't rotate , i'm sure.
>>
>> But the check point file size is bigger than the file size , the chukwa
>> WARN Thread-2 FileTailingAdaptor -
>> Adaptor|adaptor_2963225a90653a309cf779d4a1d815a3| file:
>> /var/log/dataproxy/gamelog, has rotated and no detection - reset counters to
>> 0L
>> And the agent began to transfer the whole log file.
>>
>> I just feel confused why agent generate a offset size is bigger than the
>> log size when the gamelog did not rotate.
>>
>> The chukwa version is 0.4.0
>>
>> --
>> Best regards,
>>
>> Ivy Tang
>>
>>
>>
>
>
>
> --
> Best regards,
>
> Ivy Tang
>
>
>

--
Ari Rabkin [EMAIL PROTECTED]
UC Berkeley Computer Science Department
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB