Eric Yang 2011-12-30, 07:16
Data would be discarded if in memory queue is full. The current implementation is to preserve the system rather than data. If you want to have full reliability then I recommend to write to file and use utf8filetailing adaptor to ensure all entries transportation are tracked. In production, there are usually a lot of collectors for both high availability and throughput, hence agents are not likely to fill up in memory queue. However, there are still areas for improvement, ie. add algorithm to discard most recent data or oldest data. Patches are welcome. :)
Sent from my iPhone
On Dec 29, 2011, at 10:38 PM, 陈镇海 <[EMAIL PROTECTED]> wrote:
> Hi Eric,
> When no collector is available,data is stored in memory queue.In this
> case , if the amount of data is large and the memory size is limited.
> Will it be "out of memory" and whether the data will be lost?
> 2011/12/30 Eric Yang <[EMAIL PROTECTED]>:
>> Data is stored in Agent in memory queue. Agent queues messages if no
>> collector is available. The reason that data is out of order in
>> chukwa/repos because data does not contain a time stamp. The demux
>> parser does not know how to sort the given data, hence the data is
>> stored in random order. You might be able to improve the order of the
>> data by modifying the demux parser to use SeqID for ordering to get
>> original order. Hope this helps.
>> On Thu, Dec 29, 2011 at 6:22 PM, 陈镇海 <[EMAIL PROTECTED]> wrote:
>>> I'm using chukwa-0.4.0. The agent and collector are in the same
>>> machine. When I use UDPAdaptor, I found a problem.
>>> The initial_adaptor is written "add UDPAdaptor Packets 1234 0". After
>>> start agent,collector and start_data_processor, I use "nc" to send
>>> some data to this udp port as followed:
>>> echo "hello" | nc -u 127.0.0.1
>>> echo "world" | nc -u 127.0.0.1
>>> echo "this is a test" | nc -u 127.0.0.1
>>> echo "good job" | nc -u 127.0.0.1
>>> echo "OK" | nc -u 127.0.0.1
>>> After it works for a while, I found something was written in HDFS. In
>>> the directory "/chukwa/dataSinkArchives", I found the data was written
>>> in correct order. But in the directory "/chukwa/repos", I found the
>>> data was written in a wrong order as followed:
>>> ............body this is a test
>>> ............body OK
>>> ............body good job
>>> ............body hello
>>> ............body world
>>> How it happened?
>>> Another problem,when I keep the agent running and stop the collector,
>>> I continue to send data to the udp port.After a while,when I start the
>>> collector,I found the data was not lost.I want to know how and where
>>> the data is stored.
>>> Thanks a lot.