Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Re: Stopping ExecSource takes very long time (about 6 hours)


Copy link to this message
-
Re: Stopping ExecSource takes very long time (about 6 hours)
sorry, for sending the wrong email.
 2014/01/21 15:17 "You Hoken" <[EMAIL PROTECTED]>:

> Hi,
>
> I am using ExecSource to execute resident shell program via "rsh" command.
> The resident shell program is simple program which doing "tail" log file
> put in server (AIX) being "rsh".
>
> Flume: 1.3.1
> JDK: 1.6.0
> Linux executing Flume (ExecSource): SUSE Linux Enterprise Server 11 SP2
> AIX: V5.2
>
> In this case, when I stop flume, took very long time (about 6 hours) to
> stop ExecSource.
>
> The details are as follows.
> It took about 6 hours between (1) and (2).
> (1) INFO  [node-shutdownHook] (org.apache.flume.source.ExecSource.stop:178)
>      - Stopping exec source with command:rsh serverXXX sh YYY.sh
> (2) INFO  [pool-4-thread-1]
> (org.apache.flume.source.ExecSource$ExecRunnable
>      .run:307)  - Command rsh serverXXX sh YYY.sh] exited with 0
>
> This happened always....
> I guess TCP keepalive setting under OS (SUSE linux) affect this situation.
> But still I don't know why takes 6 hours to stop ExecSource.
>
> So, to find the cause, I debuged these process and result is the
> followings.
>    1. ExecSource#stop:Process#destroy
>    2. ExecSource#stop:Process#waitFor (start waiting for response No.1)
>    3. ExecSource#run :Process#getErrorStream
>    4. ExecSource#run :Process#destroy
>    5. ExecSource#run :Process#waitFor (start waiting for response No.4)
>    6. ExecSource#run :Process#waitFor (end waiting for response No.4)
>    7. ExecSource#stop:Process#waitFor (end waiting for response No.1)
>
> You can see that No.5 terminates before No.2.
> It seems thread safety (synchronized (process)) is invalid, I think.
> Is this execution order correct ?
> Do you think this execution order caused my problem ?
>
> by debugging, now I am sure the followings.
> 1.two threads (ExecSource#stop and ExecSource#run) are executed at the
> same time
> 2.ExecSource#stop seems to wait for response at Process#waitFor after
>    java.lang.Process#destroy
> 3.after Process#getErrorStream, ExecSource#run seems to wait for response
> at
>    Process#waitFor after java.lang.Process#destroy
>
> In the above, I am worried if standard error from external process were
> outputted after destroying, buffer overflow in client side might be caused
> for
> deadlock at Process#waitFor.
>
> So, I think that reading standard error had better be done in other thead
> before executing waitFor (after executing destroy at ExecSource#stop).
>
> How do you think ?
>
> regards,
>
> YOU
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB