Chris Nauroth 2013-05-15, 03:39
Public Network Services 2013-05-15, 08:39
Very reasonable scenario, but the application I run does not delete the
input files, so such a race condition could not manifest itself at any
Funnily enough, experimenting around we have changed some local path
permissions and it seems to work now.
On Tue, May 14, 2013 at 8:39 PM, Chris Nauroth <[EMAIL PROTECTED]>wrote:
> Is it possible that you have multiple MR jobs (or other HDFS clients)
> operating on the same file paths that could cause a conflict if run
> At MR job submission time, the MR client identifies the set of input
> splits, which roughly correspond to the the blocks of the input HDFS files.
> (This is a simplified description, because CombineFileInputFormat or your
> own custom InputFormat can complicate the picture, but this simplification
> is fine for our purposes.) When map tasks launch, they read from the input
> splits (the HDFS file blocks). If you have an MR job that decides once of
> its input splits needs block X, and then another process decides to delete
> the HDFS file containing block X before the map task that would read the
> block launches, then you'd have a race condition that could trigger a
> problem similar to this.
> Typically, the solution is to design applications such that concurrent
> deletes while reading from a particular HDFS file are not possible. For
> example, you might code file deletion after the MR job that consumes those
> files, so that you know nothing else is reading while you're trying to
> BlockMissingException could also show up if you've lost all replicas of a
> block, but this would be extremely rare for a typical deployment with a
> replication factor of 3.
> Hope this helps,
> Chris Nauroth
> On Tue, May 14, 2013 at 2:20 PM, Public Network Services <
> [EMAIL PROTECTED]> wrote:
>> I am getting a BlockMissingException in a fairly simple application with
>> a few mappers and reducers (see end of message).
>> Looking around in the web has not helped much, including JIRA issues
>> HDFS-767 and HDFS-1907. The configuration variable
>> - dfs.client.baseTimeWindow.waitOn.BlockMissingException
>> does not seem to make a difference, either.
>> The BlockMissingException occurs in some of the runs, while in others
>> execution completes normally, which signifies a possible concurrency issue.
>> Any ideas?
>> org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
>> BP-390546703... file=...job.splitmetainfo
>> at java.security.AccessController.doPrivileged(Native Method)