Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Question Regarding FileAlreadyExistsException


Copy link to this message
-
Re: Question Regarding FileAlreadyExistsException
Daniel,

Perhaps you want your OutputFormat set as NullOutputFormat. That does
not carry any checks for output directory pre-existence.

On Thu, Aug 23, 2012 at 9:47 PM, Daniel Hoffman
<[EMAIL PROTECTED]> wrote:
> Well, I'm using the MultipleOutputs capability to create a directory
> Structure with Dates.
> So I'm managing this myself.
>
> What I've found, and I could be doing this wrong... is that I still have to
> tell the Tool that I want to use a:
> TextOutputFormat or a FileOutputFormat, and then, have to tell the
> respective formats that I want to use some directory.
>
> IE:
> TextOutputFormat.setOutputDirectory.setOutputDirectory(job,/foo/bar/);
>
> As a work around, I just made a temp directory at /tmp/datetimestamp.
>
> It doesn't make much sense though, sense the reducer uses mulitple output
> formats to make an entirely different directory structure..  Of course, I'm
> probably either not following the M/R Paradigm - or just doing it wrong.
>
> The FilealreadyExistsException was applicable to my "/foo/bar" directory
> which had very little to do with my "genuine" output.
>
>
> Dan
>
> On Thu, Aug 23, 2012 at 9:40 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> I think this specific behavior irritates a lot of new users. We may as
>> well provide a Generic Option to overwrite the output directory if
>> set. That way, we at least help avoid typing a whole delete command.
>> If you agree, please file an improvement request against MAPREDUCE
>> project on the ASF JIRA.
>>
>> On Thu, Aug 23, 2012 at 6:58 PM, Bertrand Dechoux <[EMAIL PROTECTED]>
>> wrote:
>> > I don't think so. The client is responsible for deleting the resource
>> > before, if it might exist.
>> > Correct me if I am wrong.
>> >
>> > Higher solution (such as Cascading) usually provides a way to define a
>> > strategy to handle it : KEEP, REPLACE, UPDATE ...
>> >
>> http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/SinkMode.html
>> >
>> > Regards
>> >
>> > Bertrand
>> >
>> > On Thu, Aug 23, 2012 at 3:15 PM, Daniel Hoffman <
>> [EMAIL PROTECTED]>wrote:
>> >
>> >> With respect to the FileAlreadyExistsException which occurrs when a
>> >> duplicate directory is discovered by an OutputFormat,
>> >> Is there a hadoop  property that is accessible by the client to disable
>> >> this behavior?
>> >>
>> >> IE,  disable.file.already.exists.behaviour=true
>> >>
>> >> Thank You
>> >> Daniel G. Hoffman
>> >>
>> >
>> >
>> >
>> > --
>> > Bertrand Dechoux
>>
>>
>>
>> --
>> Harsh J
>>

--
Harsh J