Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Checksum Error during Reduce Phase hadoop-1.0.2


Copy link to this message
-
Re: Checksum Error during Reduce Phase hadoop-1.0.2
Also, do you have ECC RAM?

On Aug 16, 2012, at 11:34 AM, Arun C Murthy wrote:

> Primarily, it could be caused by a corrupt disk - which is why checking if it's happening on a specific node(s) can help.
>
> Arun
>
> On Aug 16, 2012, at 10:04 AM, Pavan Kulkarni wrote:
>
>> Harsh,
>>
>> I see this on couple of nodes.But what may be the cause of this error ?Any
>> idea about it? Thanks
>>
>> On Sun, Aug 12, 2012 at 9:06 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>>
>>> Hi Pavan,
>>>
>>> Do you see this happen on a specific node every time (i.e. when the
>>> reducer runs there)?
>>>
>>> On Fri, Aug 10, 2012 at 11:43 PM, Pavan Kulkarni
>>> <[EMAIL PROTECTED]> wrote:
>>>> Hi,
>>>>
>>>> I am running a Terasort with a cluster of 8 nodes.The map phase
>>> completes
>>>> but when the reduce phase is around 68-70% I get this following error.
>>>>
>>>> *
>>>> 12/08/10 11:02:36 INFO mapred.JobClient: Task Id :
>>>> attempt_201208101018_0001_r_000027_0, Status : FAILED
>>>> java.lang.RuntimeException: problem advancing post rec#38320220
>>>> *
>>>> *        at
>>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)*
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249)
>>>> *
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245)
>>>> *
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40)
>>>> *
>>>> *        at
>>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)*
>>>> *        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)*
>>>> *        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)*
>>>> *        at java.security.AccessController.doPrivileged(Native Method)*
>>>> *        at javax.security.auth.Subject.doAs(Subject.java:416)*
>>>> *        at
>>>>
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>>>> *
>>>> *        at org.apache.hadoop.mapred.Child.main(Child.java:249)*
>>>> *Caused by: org.apache.hadoop.fs.ChecksumException: Checksum Error*
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164)*
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)*
>>>> *        at
>>> org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)*
>>>> *        at
>>> org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)*
>>>> *        at
>>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)*
>>>> *        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:374)*
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
>>>> *
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>>> *
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$RawKVIteratorReader.next(ReduceTask.java:2531)
>>>> *
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
>>>> *        at
>>>>
>>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
>>>> *
>>>> *        at
>>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>>> *
>>>> *        at
>>>> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1253)*
>>>> *        at
>>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1212)*
>>>> *        ... 10 more*
>>>>
>>>> I came across somone facing the same
>>>> issue<
>>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201001.mbox/%[EMAIL PROTECTED]%3E
>>>> in
>>>> the mail-archives and he seemed to resolve it by listing hostnames in
>>>> the */etc/hosts *file,
>>>> but all my nodes have correct info about the hostnames in /etc/hosts,

Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB