Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume >> mail # user >> Flume-NG 1.3.1 : Spooling dir source : java.io.IOException: Stream closed


Copy link to this message
-
Re: Flume-NG 1.3.1 : Spooling dir source : java.io.IOException: Stream closed
Nguyen,

It might be helpful, if the original log data is saved in a separate
directory first and then you can use a separate script/program to send the
*diffs* periodically to the directory being spooled by Flume.

If the log files are rolled, then your script might need to be aware of the
timestamp of the last event for the last diff generated.

The diffs can contain the timestamp (in milliseconds) in the file name to
prevent conflict in the spooling directory.

Using diffs will prevent duplicate events being logged more than once for
the same original file.

This will also ensure that whatever files you dump in the spolled directory
are not modified as Flume is ingesting the events.

Hope this helps.

On Mon, Jan 28, 2013 at 3:02 AM, NGuyen thi Kim Tuyen <[EMAIL PROTECTED]
> wrote:

> Thank for your reply .
>
> If spooling source only works on "done" , immutable files , it's not
> suitable my problem . I think I 'll use exec tail command instead . But
> warning from http://flume.apache.org/FlumeUserGuide.html#exec-source  : The
> problem with ExecSource and other asynchronous sources is that the source
> can not guarantee that if there is a failure to put the event into the
> Channel the client knows about it. ..... For stronger reliability
> guarantees, consider the Spooling Directory Source or direct integration
> with Flume via the SDK.
>
> I'm still considering between ExecSource and Log4jAppender .
> http://www.slideshare.net/sematext/search-analytics-with-flume-and-hbase
>
>
> Could you share me your opinion ?
>
> On Mon, Jan 28, 2013 at 2:29 PM, Mike Percy <[EMAIL PROTECTED]> wrote:
>
>> Hi Nguyễn,
>> The spooling source only works on "done", immutable files. So they have
>> to be atomically moved and they cannot be modified after being placed into
>> the spooling directory.
>>
>> Regards,
>> Mike
>>
>>
>> On Sun, Jan 27, 2013 at 11:14 PM, NGuyen thi Kim Tuyen <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi ,
>>>
>>> Please help me .
>>>
>>> I want to use Flume in the following case :
>>> Spooling directory source --> FileChannel --> HBase sink . But I have
>>> some problems with Spooling directory source :
>>>
>>> Here is my test flume.conf :
>>> t-game-db194.sources = test-hbase
>>>
>>> t-game-db194.sinks = sink-hbase
>>>
>>> t-game-db194.channels = hbase-channel
>>>
>>> #source spoolDir
>>> t-game-db194.sources.test-hbase.type = spooldir
>>>
>>> t-game-db194.sources.test-hbase.spoolDir =/var/log/testhbase
>>>
>>> t-game-db194.sources.test-hbase.fileHeader = true
>>>
>>> t-game-db194.sources.test-hbase.channels = hbase-channel
>>>
>>> #file Channel
>>> t-game-db194.channels.hbase-channel.type = file
>>>
>>> t-game-db194.channels.hbase-channel.checkpointDir >>> /var/log/flume-ng/checkpoint
>>>
>>> t-game-db194.channels.hbase-channel.dataDir = /var/log/flume-ng/filedata
>>>
>>>
>>> #sink
>>> t-game-db194.sinks.sink-hbase.type = logger
>>>
>>> t-game-db194.sinks.sink-hbase.channel = hbase-channel
>>>
>>> And I tested : echo "tuyen ssssssssss " >>
>>> "/var/log/testhbase/hbase_1.log" . The first event is OK , but the next
>>> events are not work . Here is flume.log
>>>
>>> 28 Jan 2013 13:16:47,424 INFO  [lifecycleSupervisor-1-0]
>>> (org.apache.flume.source.SpoolDirectorySource.start:64)  -
>>> SpoolDirectorySource source starting with directory:/var/log/testhbase
>>> 28 Jan 2013 13:16:47,732 INFO  [pool-7-thread-1]
>>> (org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile:229)
>>>  - Preparing to move file /var/log/testhbase/hbase_1.log to
>>> /var/log/testhbase/hbase_1.log.COMPLETED
>>> 28 Jan 2013 13:16:48,436 INFO
>>>  [SinkRunner-PollingRunner-DefaultSinkProcessor]
>>> (org.apache.flume.sink.LoggerSink.process:70)  - Event: {
>>> headers:{file=/var/log/testhbase/hbase_1.log} body: 74 75 79 65 6E 20 73 73
>>> 73 73 73 73 73 73 73 73 tuyen ssssssssss }
>>>
>>> 28 Jan 2013 13:17:08,836 INFO  [pool-7-thread-1]
>>> (org.apache.flume.client.avro.SpoolingFileLineReader.retireCurrentFile:229)
>>>  - Preparing to move file /var/log/testhbase/hbase_1.log to

°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB