-Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'
No, my job does not write files directly to disk. It simply goes to some
web pages , reads data (in the reducer phase), and parses jsons into thrift
objects which are emitted via the standard MultipleOutputs API to hdfs
Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> What does your job do? Create files directly on HDFS? If so, do you
> follow this method?:
> A local filesystem may not complain if you re-create an existent file.
> HDFS' behavior here is different. This simple Python test is what I
> >>> a = open('a', 'w')
> >>> a.write('f')
> >>> b = open('a', 'w')
> >>> b.write('s')
> >>> a.close(), b.close()
> >>> open('a').read()
> Hence it is best to use the FileOutputCommitter framework as detailed
> in the mentioned link.
> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> > Hi guys:
> > I have a map reduce job that runs normally on local file system from
> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
> > The exception I see is
> > *org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
> > Any thoughts on why this might occur in psuedo distributed mode, but not
> > regular file system ?
> Harsh J