Is it possible that you have multiple MR jobs (or other HDFS clients)
operating on the same file paths that could cause a conflict if run
At MR job submission time, the MR client identifies the set of input
splits, which roughly correspond to the the blocks of the input HDFS files.
(This is a simplified description, because CombineFileInputFormat or your
own custom InputFormat can complicate the picture, but this simplification
is fine for our purposes.) When map tasks launch, they read from the input
splits (the HDFS file blocks). If you have an MR job that decides once of
its input splits needs block X, and then another process decides to delete
the HDFS file containing block X before the map task that would read the
block launches, then you'd have a race condition that could trigger a
problem similar to this.
Typically, the solution is to design applications such that concurrent
deletes while reading from a particular HDFS file are not possible. For
example, you might code file deletion after the MR job that consumes those
files, so that you know nothing else is reading while you're trying to
BlockMissingException could also show up if you've lost all replicas of a
block, but this would be extremely rare for a typical deployment with a
replication factor of 3.
Hope this helps,
On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
[EMAIL PROTECTED]> wrote:
> I am getting a BlockMissingException in a fairly simple application with a
> few mappers and reducers (see end of message).
> Looking around in the web has not helped much, including JIRA issues
> HDFS-767 and HDFS-1907. The configuration variable
> - dfs.client.baseTimeWindow.waitOn.BlockMissingException
> does not seem to make a difference, either.
> The BlockMissingException occurs in some of the runs, while in others
> execution completes normally, which signifies a possible concurrency issue.
> Any ideas?
> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
> BP-390546703... file=...job.splitmetainfo
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
Public Network Services 2013-05-15, 08:39