Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'


Copy link to this message
-
Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'
Thanks J : just curious about how you came to hypothesize (1) (i.e.
regarding the fact that threads and the
API componentns arent thread safe in my hadoop version).

I think thats a really good guess, and I would like to be able to make
those sorts of intelligent hypotheses
myself.  Any reading you can point me to for further enlightement ?

On Mon, Apr 2, 2012 at 3:16 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Jay,
>
> Without seeing the whole stack trace all I can say as cause for that
> exception from a job is:
>
> 1. You're using threads and the API components you are using isn't
> thread safe in your version of Hadoop.
> 2. Files are being written out to HDFS directories without following
> the OC rules. (This is negated, per your response).
>
> On Mon, Apr 2, 2012 at 7:35 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> > No, my job does not write files directly to disk. It simply goes to some
> > web pages , reads data (in the reducer phase), and parses jsons into
> thrift
> > objects which are emitted via the standard MultipleOutputs API to hdfs
> > files.
> >
> > Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
> >
> > On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >
> >> Jay,
> >>
> >> What does your job do? Create files directly on HDFS? If so, do you
> >> follow this method?:
> >>
> >>
> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
> >>
> >> A local filesystem may not complain if you re-create an existent file.
> >> HDFS' behavior here is different. This simple Python test is what I
> >> mean:
> >> >>> a = open('a', 'w')
> >> >>> a.write('f')
> >> >>> b = open('a', 'w')
> >> >>> b.write('s')
> >> >>> a.close(), b.close()
> >> >>> open('a').read()
> >> 's'
> >>
> >> Hence it is best to use the FileOutputCommitter framework as detailed
> >> in the mentioned link.
> >>
> >> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> >> > Hi guys:
> >> >
> >> > I have a map reduce job that runs normally on local file system from
> >> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
> >> >
> >> > The exception I see is
> >> >
> >> > *org.apache.hadoop.ipc.RemoteException:
> >> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
> >> >
> >> >
> >> > Any thoughts on why this might occur in psuedo distributed mode, but
> not
> >> in
> >> > regular file system ?
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
> >
> >
> >
> > --
> > Jay Vyas
> > MMSB/UCHC
>
>
>
> --
> Harsh J
>

--
Jay Vyas
MMSB/UCHC