Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Flume >> mail # user >> Flume startup takes ~ hour


+
Anat Rozenzon 2013-09-23, 16:16
+
Hari Shreedharan 2013-09-23, 17:50
+
Anat Rozenzon 2013-09-24, 11:34
+
Anat Rozenzon 2013-09-24, 13:10
+
Anat Rozenzon 2013-09-24, 13:12
+
Hari Shreedharan 2013-09-24, 18:15
Copy link to this message
-
Re: Flume startup takes ~ hour
OK, I understand.

I can't apply the patch, I have a format failed error, not sure why.
Is this a diff from trunk? or from some local version? I see some changes
with no matching lines in code.

Many thanks
Anat
On Tue, Sep 24, 2013 at 9:15 PM, Hari Shreedharan <[EMAIL PROTECTED]
> wrote:

> That is actually a symptom of the real problem. The real problem is that
> the remove method ends up hitting the main checkpoint data structure and
> causes too many ops on the hash map. The real fix is in the patch I
> mentioned which reduce the number of ops tremendously.
>
>
> Thanks,
> Hari
>
> On Tuesday, September 24, 2013 at 6:12 AM, Anat Rozenzon wrote:
>
> For example this stack trace:
>
>
> "lifecycleSupervisor-1-2" prio=10 tid=0x00007f89141d8800 nid=0x5ac8
> runnable [0x00007f89501ad000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.Integer.valueOf(Integer.java:642)
>         at
> org.apache.flume.channel.file.EventQueueBackingStoreFile.get(EventQueueBackingStoreFile.java:310)
>         at
> org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
>         at
> org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
>         - locked <0x00000006890f68f0> (a
> org.apache.flume.channel.file.FlumeEventQueue)
>         at
> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
>         at
> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
>         at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
>         at org.apache.flume.channel.file.Log.replay(Log.java:430)
>         at
> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
>         - locked <0x00000006890ea360> (a
> org.apache.flume.channel.file.FileChannel)
>         at
> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>         - locked <0x00000006890ea360> (a
> org.apache.flume.channel.file.FileChannel)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
>
>
> On Tue, Sep 24, 2013 at 4:10 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>
> After some deeper dive, it seems that the problem is with HashMap usage in
> EventQueueBackingStoreFile.
>
> Almost every time I run jstack the JVM is inside
> EventQueueBackingStoreFile.get() doing either HashMap.containsKey() or
> Integer.valueOf().
> This is because of overwriteMap is defined as regular HashMap<Integer,
> Long>().
>
> Does your fix solves this issue?
>
> I think maybe using a Long[] will be better.
>
>
> On Tue, Sep 24, 2013 at 2:34 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>
> Thanks Hari, great news, I'll be glad to test it.
>
> However, I don't have environment with trunk, any way I can get it
> packaged somehow?
>
>
> On Mon, Sep 23, 2013 at 8:50 PM, Hari Shreedharan <
> [EMAIL PROTECTED]> wrote:
>
>  How many events does the File Channel get every 30 seconds and how many
> get taken out? This is one of the edge cases of the File Channel I have
> been working on ironing out. There is a patch on
> https://issues.apache.org/jira/browse/FLUME-2155 (the
> FLUME-2155-initial.patch file). If you have data that takes an hour to
> start, and don't mind testing out this patch (this might be buggy, cause
> data loss, hangs etc - so testing in prod is not recommended), apply this
+
Anat Rozenzon 2013-09-26, 19:13