Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume is replaying log for hours now


Copy link to this message
-
Re: Flume is replaying log for hours now
Brock Noland 2013-08-08, 12:19
use-fast-replay would help but you'd need 4-5GB of heap per channel. With
heaps that large you use be using dual checkpointing to avoid this.

Here is the thread doing the replay:

"lifecycleSupervisor-1-0" prio=10 tid=0x00007f040472c800 nid=0x1332b
runnable [0x00007f03f84ce000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:194)
        - locked <0x00000007256d3dc8> (a
org.apache.flume.channel.file.FlumeEventQueue)
        at org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
        at org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
        at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
        at org.apache.flume.channel.file.Log.replay(Log.java:430)
        at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
        - locked <0x00000007256d2e38> (a
org.apache.flume.channel.file.FileChannel)
        at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        - locked <0x00000007256d2e38> (a
org.apache.flume.channel.file.FileChannel)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)

On Thu, Aug 8, 2013 at 12:52 AM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm trying to restart Flume. My setup is:
>
> Acro source => File channel 1 => HDFS sink
>                    => File channel 2 => Another HDFS sink
>                    => File channel 3 => File sink
>
> But it seem to be doing replayLog for hours now, after seeing this
> yesterday, I even tried setting use-fast-replay=true, but it didn't help.
>
> Each file channel capacity is 100000000, is this too high for Flume? I
> started on lower number but then it complained that the channel is getting
> filled so I made it higher.
>
> My log is repeatedly writing such lines:
> 08 Aug 2013 01:36:22,856 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 3240000
> records
> 08 Aug 2013 01:36:41,324 INFO  [lifecycleSupervisor-1-0]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 3350000
> records
> 08 Aug 2013 01:38:35,794 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 3250000
> records
> 08 Aug 2013 01:40:48,759 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 3260000
> records
> 08 Aug 2013 01:41:01,684 INFO  [lifecycleSupervisor-1-0]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 4090000
> records
> 08 Aug 2013 01:41:36,691 INFO  [lifecycleSupervisor-1-0]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 4100000
> records
> 08 Aug 2013 01:42:27,528 INFO  [lifecycleSupervisor-1-0]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 4110000
> records
> 08 Aug 2013 01:42:57,725 INFO  [lifecycleSupervisor-1-1]
> (org.apache.flume.channel.file.ReplayHandler.replayLog:293)  - Read 3270000
> records
>
>
> In attaching jstack output, I wasn't sure what the threads are doing but
> in any case many of them seem to be waiting..
>
> Any idea what I can do to make the server start?
>
> Thanks
> Anat
>
>
--
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org