Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive


Copy link to this message
-
Re: tasktracker keep recevied KillJobAction and then delete unknown job while using hive
A not well written job can easy overload a TaskTracker. The first question is,  why one TT has no problems and the other has. Take a look at that node in the logs. Did you see messages like "0 slots free" the handler count could you help.

dfs.namenode.handler.count can be set to 15 or similar. 10 is very moderate.

best,
 Alex  

--
Alexander Lorenz
http://mapredit.blogspot.com

On Feb 1, 2012, at 4:11 PM, Xiaobin She wrote:

> hi Alex,
>
> I did not set the value of dfs.namenode.handler.count in the config file, so it shoule be the default value, like 10.
>
> I only have two datanodes, 10 is not enough ?
>
> And if it is not enough , why the tasktracker will keep receiveing KillJobAction and delete unknown job?
>
> thank you very much for your help!
>
> 2012/2/1 alo alt <[EMAIL PROTECTED]>
> How much namenode handler (dfs.namenode.handler.count) you have defined for your cluster?
>
> - Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Feb 1, 2012, at 12:25 PM, Xiaobin She wrote:
>
> >
> > hi Alex,
> >
> > I'm using jre 1.6.0_24
> >
> > with hadoop 0.20.0
> > hive 0.80
> >
> > thx
> >
> >
> > 2012/2/1 alo alt <[EMAIL PROTECTED]>
> > Hi,
> >
> > + hdfs-user (bcc'd)
> >
> > which jre version u use?
> >
> > - Alex
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > On Feb 1, 2012, at 8:16 AM, Xiaobin She wrote:
> >
> > > hi ,
> > >
> > >
> > > I'm using hive to do some log analysis, and I have encountered a problem.
> > >
> > > My cluster have 3 nodes, one for NameNode/JobTracker and the other two for DataNode/TaskTracker
> > >
> > > One of the tasktracker will repeatedly receive KillJobAction and then delete unknown jobs
> > >
> > > the logs look like:
> > >
> > > 2012-01-31 00:35:37,640 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0381
> > > 2012-01-31 00:35:37,640 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0381 being deleted.
> > > 2012-01-31 00:36:22,697 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0383
> > > 2012-01-31 00:36:22,698 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0383 being deleted.
> > > 2012-01-31 01:05:34,108 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0384
> > > 2012-01-31 01:05:34,108 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0384 being deleted.
> > > 2012-01-31 01:07:43,280 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_201201301055_0385
> > > 2012-01-31 01:07:43,280 WARN org.apache.hadoop.mapred.TaskTracker: Unknown job job_201201301055_0385 being deleted.
> > >
> > > this happens occasionally, and if this happens, this tasktracker will do notghing but keep receiveing KillJobAction and delete unknown job, and thus the performance will drop down.
> > >
> > > to solve this problem, I have to restart the cluster.
> > > but obviously, this is not a good solution.
> > >
> > > these jobs eventually will be run on the other tasktracker, and they will run well, the job will success.
> > >
> > > has anybody have encountered this problem and give me some advices?
> > >
> > > and occasionally there will be some errlog like:
> > >
> > > 2012-01-31 13:11:40,183 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 55837: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0
> > > java.io.IOException: Connection reset by peer
> > >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> > >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> > >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> > >         at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> > >         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > >         at org.apache.hadoop.ipc.Server.channelRead(Server.java:1211)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB