Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume startup takes ~ hour


Copy link to this message
-
Re: Flume startup takes ~ hour
Anat Rozenzon 2013-09-24, 13:12
For example this stack trace:
"lifecycleSupervisor-1-2" prio=10 tid=0x00007f89141d8800 nid=0x5ac8
runnable [0x00007f89501ad000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.Integer.valueOf(Integer.java:642)
        at
org.apache.flume.channel.file.EventQueueBackingStoreFile.get(EventQueueBackingStoreFile.java:310)
        at
org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
        at
org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
        - locked <0x00000006890f68f0> (a
org.apache.flume.channel.file.FlumeEventQueue)
        at
org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
        at
org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
        at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
        at org.apache.flume.channel.file.Log.replay(Log.java:430)
        at
org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
        - locked <0x00000006890ea360> (a
org.apache.flume.channel.file.FileChannel)
        at
org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
        - locked <0x00000006890ea360> (a
org.apache.flume.channel.file.FileChannel)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

On Tue, Sep 24, 2013 at 4:10 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:

> After some deeper dive, it seems that the problem is with HashMap usage in
> EventQueueBackingStoreFile.
>
> Almost every time I run jstack the JVM is inside
> EventQueueBackingStoreFile.get() doing either HashMap.containsKey() or
> Integer.valueOf().
> This is because of overwriteMap is defined as regular HashMap<Integer,
> Long>().
>
> Does your fix solves this issue?
>
> I think maybe using a Long[] will be better.
>
>
> On Tue, Sep 24, 2013 at 2:34 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>
>> Thanks Hari, great news, I'll be glad to test it.
>>
>> However, I don't have environment with trunk, any way I can get it
>> packaged somehow?
>>
>>
>> On Mon, Sep 23, 2013 at 8:50 PM, Hari Shreedharan <
>> [EMAIL PROTECTED]> wrote:
>>
>>>  How many events does the File Channel get every 30 seconds and how many
>>> get taken out? This is one of the edge cases of the File Channel I have
>>> been working on ironing out. There is a patch on
>>> https://issues.apache.org/jira/browse/FLUME-2155 (the
>>> FLUME-2155-initial.patch file). If you have data that takes an hour to
>>> start, and don't mind testing out this patch (this might be buggy, cause
>>> data loss, hangs etc - so testing in prod is not recommended), apply this
>>> patch to trunk and test it out, and see if it improves the startup time.
>>>
>>>
>>> Thanks,
>>> Hari
>>>
>>> On Monday, September 23, 2013 at 9:16 AM, Anat Rozenzon wrote:
>>>
>>> Hi,
>>>
>>> I have a flume instance that is collecting logs from several flume
>>> agents using avro source and file channel.
>>> Recently, when I'm restarting the collector it takes about an hour to
>>> start listening on the avro port.
>>> PSB a jstack entry, any idea why the startup is slow?
>>>
>>> Thanks
>>> Anat
>>>
>>> "lifecycleSupervisor-1-0" prio=10 tid=0x00007f01505e4800 nid=0x4c78
>>> runnable [0x00007f01441d6000]
>>>    java.lang.Thread.State: RUNNABLE
>>>         at
>>> org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)