Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Number of retries


Copy link to this message
-
Re: Number of retries
Hi Mohit
     To add on, duplicates won't be there if your output is written to a hdfs file. Because if one attempt of a task is completed only that output file is copied to the final output destn and the files generated by other task attempts that are killed are just ignored.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: "Bejoy KS" <[EMAIL PROTECTED]>
Date: Thu, 22 Mar 2012 19:55:55
To: <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
Subject: Re: Number of retries

Mohit
      If you are writing to a db from a job in an atomic way, this would pop up. You can avoid this only by disabling speculative execution.
Drilling down from web UI to a task level would get you the tasks where multiple attempts were there.

------Original Message------
From: Mohit Anchlia
To: [EMAIL PROTECTED]
ReplyTo: [EMAIL PROTECTED]
Subject: Number of retries
Sent: Mar 23, 2012 01:21

I am seeing wierd problem where I am seeing duplicate rows in the database.
I am wondering if this is because of some internal retries that might be
causing this. Is there a way to look at which tasks were retried? I am not
sure what else might cause because when I look at the output data I don't
see any duplicates in the file.

Regards
Bejoy KS

Sent from handheld, please excuse typos.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB