Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'


Copy link to this message
-
Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'
Harsh J 2012-04-02, 14:16
Jay,

Without seeing the whole stack trace all I can say as cause for that
exception from a job is:

1. You're using threads and the API components you are using isn't
thread safe in your version of Hadoop.
2. Files are being written out to HDFS directories without following
the OC rules. (This is negated, per your response).

On Mon, Apr 2, 2012 at 7:35 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> No, my job does not write files directly to disk. It simply goes to some
> web pages , reads data (in the reducer phase), and parses jsons into thrift
> objects which are emitted via the standard MultipleOutputs API to hdfs
> files.
>
> Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
>
> On Mon, Apr 2, 2012 at 2:52 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Jay,
>>
>> What does your job do? Create files directly on HDFS? If so, do you
>> follow this method?:
>>
>> http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
>>
>> A local filesystem may not complain if you re-create an existent file.
>> HDFS' behavior here is different. This simple Python test is what I
>> mean:
>> >>> a = open('a', 'w')
>> >>> a.write('f')
>> >>> b = open('a', 'w')
>> >>> b.write('s')
>> >>> a.close(), b.close()
>> >>> open('a').read()
>> 's'
>>
>> Hence it is best to use the FileOutputCommitter framework as detailed
>> in the mentioned link.
>>
>> On Mon, Apr 2, 2012 at 7:09 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>> > Hi guys:
>> >
>> > I have a map reduce job that runs normally on local file system from
>> > eclipse, *but* it fails on HDFS running in psuedo distributed mode.
>> >
>> > The exception I see is
>> >
>> > *org.apache.hadoop.ipc.RemoteException:
>> > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:*
>> >
>> >
>> > Any thoughts on why this might occur in psuedo distributed mode, but not
>> in
>> > regular file system ?
>>
>>
>>
>> --
>> Harsh J
>>
>
>
>
> --
> Jay Vyas
> MMSB/UCHC

--
Harsh J