Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> FileAlreadyExistsException while running pig

Copy link to this message
Re: FileAlreadyExistsException while running pig
Hello Haitao,

    Each time we run a MapReduce job, the job expects the output to be
non-existent. If the output path is already there then
FileAlreadyExists  exception is thrown. And as we know that each Pig
job is eventually a MapReduce job, it also expects the same.

    Mohammad Tariq
On Fri, Aug 10, 2012 at 11:18 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> Usually that means the the directory you are trying to store to already exists.  Pig won't overwrite existing data.  You should either move or remove the directory or change the directory name in your store function.
> Alan.
> On Aug 9, 2012, at 7:42 PM, Haitao Yao wrote:
>> hi, all
>>       I got this while running pig script:
>> 997: Unable to recreate exception from backend error:
>> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 already exists
>>        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecsHelper(PigOutputFormat.java:207)
>>        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:188)
>>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:893)
>>        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:415)
>>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
>>        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:856)
>>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:830)
>>        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>>        at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>>        at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>>        at java.lang.Thread.run(Thread.java:722)
>> But I checked the script , the directory:  hdfs://DC-hadoop01:9000/tmp/pig-temp/temp548500412/tmp-1456742965 is not used by the script explicitly, so I think it is used by the pig to store tmp results.
>> But why it exists? Isn't it unique?
>> Haitao Yao
>> weibo: @haitao_yao
>> Skype:  haitao.yao.final