-Re: Stopping ExecSource takes very long time (about 6 hours)
You Hoken 2014-01-21, 06:29
sorry, for sending the wrong email.
2014/01/21 15:17 "You Hoken" <[EMAIL PROTECTED]>:
> I am using ExecSource to execute resident shell program via "rsh" command.
> The resident shell program is simple program which doing "tail" log file
> put in server (AIX) being "rsh".
> Flume: 1.3.1
> JDK: 1.6.0
> Linux executing Flume (ExecSource): SUSE Linux Enterprise Server 11 SP2
> AIX: V5.2
> In this case, when I stop flume, took very long time (about 6 hours) to
> stop ExecSource.
> The details are as follows.
> It took about 6 hours between (1) and (2).
> (1) INFO [node-shutdownHook] (org.apache.flume.source.ExecSource.stop:178)
> - Stopping exec source with command:rsh serverXXX sh YYY.sh
> (2) INFO [pool-4-thread-1]
> .run:307) - Command rsh serverXXX sh YYY.sh] exited with 0
> This happened always....
> I guess TCP keepalive setting under OS (SUSE linux) affect this situation.
> But still I don't know why takes 6 hours to stop ExecSource.
> So, to find the cause, I debuged these process and result is the
> 1. ExecSource#stop：Process#destroy
> 2. ExecSource#stop：Process#waitFor (start waiting for response No.1)
> 3. ExecSource#run ：Process#getErrorStream
> 4. ExecSource#run ：Process#destroy
> 5. ExecSource#run ：Process#waitFor (start waiting for response No.4)
> 6. ExecSource#run ：Process#waitFor (end waiting for response No.4)
> 7. ExecSource#stop：Process#waitFor (end waiting for response No.1)
> You can see that No.5 terminates before No.2.
> It seems thread safety (synchronized (process)) is invalid, I think.
> Is this execution order correct ?
> Do you think this execution order caused my problem ?
> by debugging, now I am sure the followings.
> 1.two threads (ExecSource#stop and ExecSource#run) are executed at the
> same time
> 2.ExecSource#stop seems to wait for response at Process#waitFor after
> 3.after Process#getErrorStream, ExecSource#run seems to wait for response
> Process#waitFor after java.lang.Process#destroy
> In the above, I am worried if standard error from external process were
> outputted after destroying, buffer overflow in client side might be caused
> deadlock at Process#waitFor.
> So, I think that reading standard error had better be done in other thead
> before executing waitFor (after executing destroy at ExecSource#stop).
> How do you think ?