Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Question Regarding FileAlreadyExistsException


+
Daniel Hoffman 2012-08-23, 13:15
+
Bertrand Dechoux 2012-08-23, 13:28
+
Harsh J 2012-08-23, 13:40
+
Daniel Hoffman 2012-08-23, 16:17
Copy link to this message
-
Re: Question Regarding FileAlreadyExistsException
Daniel,

Perhaps you want your OutputFormat set as NullOutputFormat. That does
not carry any checks for output directory pre-existence.

On Thu, Aug 23, 2012 at 9:47 PM, Daniel Hoffman
<[EMAIL PROTECTED]> wrote:
> Well, I'm using the MultipleOutputs capability to create a directory
> Structure with Dates.
> So I'm managing this myself.
>
> What I've found, and I could be doing this wrong... is that I still have to
> tell the Tool that I want to use a:
> TextOutputFormat or a FileOutputFormat, and then, have to tell the
> respective formats that I want to use some directory.
>
> IE:
> TextOutputFormat.setOutputDirectory.setOutputDirectory(job,/foo/bar/);
>
> As a work around, I just made a temp directory at /tmp/datetimestamp.
>
> It doesn't make much sense though, sense the reducer uses mulitple output
> formats to make an entirely different directory structure..  Of course, I'm
> probably either not following the M/R Paradigm - or just doing it wrong.
>
> The FilealreadyExistsException was applicable to my "/foo/bar" directory
> which had very little to do with my "genuine" output.
>
>
> Dan
>
> On Thu, Aug 23, 2012 at 9:40 AM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> I think this specific behavior irritates a lot of new users. We may as
>> well provide a Generic Option to overwrite the output directory if
>> set. That way, we at least help avoid typing a whole delete command.
>> If you agree, please file an improvement request against MAPREDUCE
>> project on the ASF JIRA.
>>
>> On Thu, Aug 23, 2012 at 6:58 PM, Bertrand Dechoux <[EMAIL PROTECTED]>
>> wrote:
>> > I don't think so. The client is responsible for deleting the resource
>> > before, if it might exist.
>> > Correct me if I am wrong.
>> >
>> > Higher solution (such as Cascading) usually provides a way to define a
>> > strategy to handle it : KEEP, REPLACE, UPDATE ...
>> >
>> http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/SinkMode.html
>> >
>> > Regards
>> >
>> > Bertrand
>> >
>> > On Thu, Aug 23, 2012 at 3:15 PM, Daniel Hoffman <
>> [EMAIL PROTECTED]>wrote:
>> >
>> >> With respect to the FileAlreadyExistsException which occurrs when a
>> >> duplicate directory is discovered by an OutputFormat,
>> >> Is there a hadoop  property that is accessible by the client to disable
>> >> this behavior?
>> >>
>> >> IE,  disable.file.already.exists.behaviour=true
>> >>
>> >> Thank You
>> >> Daniel G. Hoffman
>> >>
>> >
>> >
>> >
>> > --
>> > Bertrand Dechoux
>>
>>
>>
>> --
>> Harsh J
>>

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB