Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Checksum Error during Reduce Phase hadoop-1.0.2


Copy link to this message
-
Re: Checksum Error during Reduce Phase hadoop-1.0.2
Arun C Murthy 2012-08-16, 18:34
Primarily, it could be caused by a corrupt disk - which is why checking if it's happening on a specific node(s) can help.

Arun

On Aug 16, 2012, at 10:04 AM, Pavan Kulkarni wrote:

> Harsh,
>
> I see this on couple of nodes.But what may be the cause of this error ?Any
> idea about it? Thanks
>
> On Sun, Aug 12, 2012 at 9:06 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Hi Pavan,
>>
>> Do you see this happen on a specific node every time (i.e. when the
>> reducer runs there)?
>>
>> On Fri, Aug 10, 2012 at 11:43 PM, Pavan Kulkarni
>> <[EMAIL PROTECTED]> wrote:
>>> Hi,
>>>
>>> I am running a Terasort with a cluster of 8 nodes.The map phase
>> completes
>>> but when the reduce phase is around 68-70% I get this following error.
>>>
>>> *
>>> 12/08/10 11:02:36 INFO mapred.JobClient: Task Id :
>>> attempt_201208101018_0001_r_000027_0, Status : FAILED
>>> java.lang.RuntimeException: problem advancing post rec#38320220
>>> *
>>> *        at
>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)*
>>> *        at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249)
>>> *
>>> *        at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245)
>>> *
>>> *        at
>>>
>> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40)
>>> *
>>> *        at
>>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)*
>>> *        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)*
>>> *        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)*
>>> *        at java.security.AccessController.doPrivileged(Native Method)*
>>> *        at javax.security.auth.Subject.doAs(Subject.java:416)*
>>> *        at
>>>
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>>> *
>>> *        at org.apache.hadoop.mapred.Child.main(Child.java:249)*
>>> *Caused by: org.apache.hadoop.fs.ChecksumException: Checksum Error*
>>> *        at
>>>
>> org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164)*
>>> *        at
>>>
>> org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)*
>>> *        at
>> org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)*
>>> *        at
>> org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)*
>>> *        at
>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)*
>>> *        at org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:374)*
>>> *        at
>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
>>> *        at
>>>
>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
>>> *
>>> *        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>> *
>>> *        at
>>>
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$RawKVIteratorReader.next(ReduceTask.java:2531)
>>> *
>>> *        at
>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
>>> *        at
>>>
>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
>>> *
>>> *        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>> *
>>> *        at
>>> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1253)*
>>> *        at
>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1212)*
>>> *        ... 10 more*
>>>
>>> I came across somone facing the same
>>> issue<
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201001.mbox/%[EMAIL PROTECTED]%3E
>>> in
>>> the mail-archives and he seemed to resolve it by listing hostnames in
>>> the */etc/hosts *file,
>>> but all my nodes have correct info about the hostnames in /etc/hosts,
>> but I
>>> still have these reducers throwing error.
>>> Any help regarding this issue is appreciated .Thanks
>>>
>>> --
>>>
>>> --With Regards
>>> Pavan Kulkarni
>>
>>
>>
>> --

Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/