Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Killing hadoop jobs automatically


Copy link to this message
-
Re: Killing hadoop jobs automatically
Dear Praveenesh

I think there are only two ways to kill a job:
1- kill command, (not perfect way cause you should know the job id)
2- mapred.task.timeout (in "bin/hadoop jar" command using {-Dmapred.task.timeout=} set your desired value in msec)

sometimes for me its happened too, not in all machines in some special machines jobs executed slowly than others i think cause of hardware problems.
As i know Shuffling is done by hadoop and we can only contribute in it by setting output format class.Be aware its normal that some jobs finished later than others so dont be so sensitive on it since hadoop manage all things, overall result is our goal in hadoop based computation,

I hope it could be helpful.

Good Luck,
Masoud,
On 01/30/2012 06:07 PM, praveenesh kumar wrote:
> @ Harsh -
>
> Yeah, mapred.task.timeout is the valid option. but for some reasons, its
> not happening the way it should be.. I am not sure what could be the
> cause.Thing is my jobs are running fine, its just that they are slow at
> shuffling phase, sometimes.. not everytime.. so I was thinking "as an admin
> - can we control the running of jobs, just as a  test, where we can just
> kill the jobs who are taking more time for execution -- not only those jobs
> that are hanging..but jobs that are taking more execution time than
> expected". Problems in my case is, end-user doesn't want to go through the
> pain of managing/controlling jobs over hadoop. They want all these job
> handling should happen automatically, so that made me to think in such a
> way (which I know is not the best way)
>
> Anyways, going away from the topic -- Is there anyway through which I can
> improve my shuffling (through any configuration parameters only, knowing
> the fact that users doesn't know the idea of minimizing the key/value
> pairs)
>
> Thanks,
> Praveenesh
>
> On Mon, Jan 30, 2012 at 1:06 PM, Masoud<[EMAIL PROTECTED]>  wrote:
>
>> Hi,
>>
>> Every Map/Reduce app has a Reporter, You can set the configuration
>> parameter {mapred.task.timeout} of  Reporter to your desired value.
>>
>> Good Luck.
>>
>>
>> On 01/30/2012 04:14 PM, praveenesh kumar wrote:
>>
>>> Yeah, I am aware of that, but it needs you to explicity monitor the job
>>> and
>>> look for jobid and then hadoop job -kill command.
>>> What I want to know - "Is there anyway to do all this automatically by
>>> providing some timer or something -- that if my job is taking more than
>>> some predefined time, it would get killed automatically
>>>
>>> Thanks,
>>> Praveenesh
>>>
>>> On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
>>> <[EMAIL PROTECTED]>wrote:
>>>
>>>   You might want to take a look at the kill command : "hadoop job -kill
>>>> <jobid>".
>>>>
>>>> Prashant
>>>>
>>>> On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar<[EMAIL PROTECTED]
>>>>
>>>>> wrote:
>>>>> Is there anyway through which we can kill hadoop jobs that are taking
>>>>> enough time to execute ?
>>>>>
>>>>> What I want to achieve is - If some job is running more than
>>>>> "_some_predefined_timeout_**limit", it should be killed automatically.
>>>>>
>>>>> Is it possible to achieve this, through shell scripts or any other way ?
>>>>>
>>>>> Thanks,
>>>>> Praveenesh
>>>>>
>>>>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB