Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Flume, mail # user - Flume startup takes ~ hour


Copy link to this message
-
Re: Flume startup takes ~ hour
Anat Rozenzon 2013-09-26, 19:13
Hari,

Maybe you can just send me the java source for both classes?

Thanks
Anat
On Wed, Sep 25, 2013 at 9:29 AM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:

> OK, I understand.
>
> I can't apply the patch, I have a format failed error, not sure why.
> Is this a diff from trunk? or from some local version? I see some changes
> with no matching lines in code.
>
> Many thanks
> Anat
>
>
> On Tue, Sep 24, 2013 at 9:15 PM, Hari Shreedharan <
> [EMAIL PROTECTED]> wrote:
>
>> That is actually a symptom of the real problem. The real problem is that
>> the remove method ends up hitting the main checkpoint data structure and
>> causes too many ops on the hash map. The real fix is in the patch I
>> mentioned which reduce the number of ops tremendously.
>>
>>
>> Thanks,
>> Hari
>>
>> On Tuesday, September 24, 2013 at 6:12 AM, Anat Rozenzon wrote:
>>
>> For example this stack trace:
>>
>>
>> "lifecycleSupervisor-1-2" prio=10 tid=0x00007f89141d8800 nid=0x5ac8
>> runnable [0x00007f89501ad000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.lang.Integer.valueOf(Integer.java:642)
>>         at
>> org.apache.flume.channel.file.EventQueueBackingStoreFile.get(EventQueueBackingStoreFile.java:310)
>>         at
>> org.apache.flume.channel.file.FlumeEventQueue.get(FlumeEventQueue.java:225)
>>         at
>> org.apache.flume.channel.file.FlumeEventQueue.remove(FlumeEventQueue.java:195)
>>         - locked <0x00000006890f68f0> (a
>> org.apache.flume.channel.file.FlumeEventQueue)
>>         at
>> org.apache.flume.channel.file.ReplayHandler.processCommit(ReplayHandler.java:405)
>>         at
>> org.apache.flume.channel.file.ReplayHandler.replayLog(ReplayHandler.java:328)
>>         at org.apache.flume.channel.file.Log.doReplay(Log.java:503)
>>         at org.apache.flume.channel.file.Log.replay(Log.java:430)
>>         at
>> org.apache.flume.channel.file.FileChannel.start(FileChannel.java:302)
>>         - locked <0x00000006890ea360> (a
>> org.apache.flume.channel.file.FileChannel)
>>         at
>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>>         - locked <0x00000006890ea360> (a
>> org.apache.flume.channel.file.FileChannel)
>>         at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>         at
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
>>         at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>         at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         at java.lang.Thread.run(Thread.java:724)
>>
>>
>>
>> On Tue, Sep 24, 2013 at 4:10 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>
>> After some deeper dive, it seems that the problem is with HashMap usage
>> in EventQueueBackingStoreFile.
>>
>> Almost every time I run jstack the JVM is inside
>> EventQueueBackingStoreFile.get() doing either HashMap.containsKey() or
>> Integer.valueOf().
>> This is because of overwriteMap is defined as regular HashMap<Integer,
>> Long>().
>>
>> Does your fix solves this issue?
>>
>> I think maybe using a Long[] will be better.
>>
>>
>> On Tue, Sep 24, 2013 at 2:34 PM, Anat Rozenzon <[EMAIL PROTECTED]> wrote:
>>
>> Thanks Hari, great news, I'll be glad to test it.
>>
>> However, I don't have environment with trunk, any way I can get it
>> packaged somehow?
>>
>>
>> On Mon, Sep 23, 2013 at 8:50 PM, Hari Shreedharan <
>> [EMAIL PROTECTED]> wrote:
>>
>>  How many events does the File Channel get every 30 seconds and how many
>> get taken out? This is one of the edge cases of the File Channel I have
>> been working on ironing out. There is a patch on