Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Re: Stopping ExecSource takes very long time (about 6 hours)


Copy link to this message
-
Re: Stopping ExecSource takes very long time (about 6 hours)
You Hoken 2014-01-21, 06:29
sorry, for sending the wrong email.
 2014/01/21 15:17 "You Hoken" <[EMAIL PROTECTED]>:

> Hi,
>
> I am using ExecSource to execute resident shell program via "rsh" command.
> The resident shell program is simple program which doing "tail" log file
> put in server (AIX) being "rsh".
>
> Flume: 1.3.1
> JDK: 1.6.0
> Linux executing Flume (ExecSource): SUSE Linux Enterprise Server 11 SP2
> AIX: V5.2
>
> In this case, when I stop flume, took very long time (about 6 hours) to
> stop ExecSource.
>
> The details are as follows.
> It took about 6 hours between (1) and (2).
> (1) INFO  [node-shutdownHook] (org.apache.flume.source.ExecSource.stop:178)
>      - Stopping exec source with command:rsh serverXXX sh YYY.sh
> (2) INFO  [pool-4-thread-1]
> (org.apache.flume.source.ExecSource$ExecRunnable
>      .run:307)  - Command rsh serverXXX sh YYY.sh] exited with 0
>
> This happened always....
> I guess TCP keepalive setting under OS (SUSE linux) affect this situation.
> But still I don't know why takes 6 hours to stop ExecSource.
>
> So, to find the cause, I debuged these process and result is the
> followings.
>    1. ExecSource#stop:Process#destroy
>    2. ExecSource#stop:Process#waitFor (start waiting for response No.1)
>    3. ExecSource#run :Process#getErrorStream
>    4. ExecSource#run :Process#destroy
>    5. ExecSource#run :Process#waitFor (start waiting for response No.4)
>    6. ExecSource#run :Process#waitFor (end waiting for response No.4)
>    7. ExecSource#stop:Process#waitFor (end waiting for response No.1)
>
> You can see that No.5 terminates before No.2.
> It seems thread safety (synchronized (process)) is invalid, I think.
> Is this execution order correct ?
> Do you think this execution order caused my problem ?
>
> by debugging, now I am sure the followings.
> 1.two threads (ExecSource#stop and ExecSource#run) are executed at the
> same time
> 2.ExecSource#stop seems to wait for response at Process#waitFor after
>    java.lang.Process#destroy
> 3.after Process#getErrorStream, ExecSource#run seems to wait for response
> at
>    Process#waitFor after java.lang.Process#destroy
>
> In the above, I am worried if standard error from external process were
> outputted after destroying, buffer overflow in client side might be caused
> for
> deadlock at Process#waitFor.
>
> So, I think that reading standard error had better be done in other thead
> before executing waitFor (after executing destroy at ExecSource#stop).
>
> How do you think ?
>
> regards,
>
> YOU
>