|
|
-
Task Side Effect files and copying(getWorkOutputPath)
Saptarshi Guha 2009-03-16, 15:59
Hello, I would like to produce side effect files which will be later copied to the outputfolder. I am using FileOuputFormat, and in the Map's close() method i copy files (from the local tmp/ folder) to FileOutputFormat.getWorkOutputPath(job);
void close() .... { if (shouldcopy) { ArrayList<Path> lop = new ArrayList<Path>(); for(String ff : tempdir.list()){ lop.add(new Path(temppfx+ff)); } dstFS.moveFromLocalFile(lop.toArray(new Path[]{}), dstPath); }
However, this throws an error java.io.IOException: `hdfs://X:54310/tmp/testseq/_temporary/_attempt_200903160945_0010_m_000000_0': specified destination directory doest not exist
I though this is the right to place to drop side effect files. Prior to this I was copying o the output folder, but many were not copied, or in fact all were, but during the reduce output stage many were deleted - am not sure(I used NullOutputFormat and all the files were present in the output folder) So i resorted to getWorkOutputPath which threw the above exception.
So if I'm using FileOutputFormat, and my maps and/or reduces produce side effects files on the localFS 1)when should I copy them to the DFS (e.g the close method? or one at a time in the map/reduce method) 2) Where should i copy them to.
I am using Hadoop 0.19 and have set jobConf.setNumTasksToExecutePerJvm(-1); Also, each side effect file produced has a unique name, i.e there is no overwriting.
Thank you Saptarshi Guha
-
Re: Task Side Effect files and copying(getWorkOutputPath)
Amareshwari Sriramadasu 2009-03-17, 05:14
Saptarshi Guha wrote: > Hello, > I would like to produce side effect files which will be later copied > to the outputfolder. > I am using FileOuputFormat, and in the Map's close() method i copy > files (from the local tmp/ folder) to > FileOutputFormat.getWorkOutputPath(job); > > FileOutputFormat.getWorkOutputPath(job) is the correct method to get directory for task-side effect files.
You should not use close() method, because promotion to output directory happens before close(). You can use configure() method. See org.apache.hadoop.tools.HadoopArchives. > void close() .... { > if (shouldcopy) { > ArrayList<Path> lop = new ArrayList<Path>(); > for(String ff : tempdir.list()){ > lop.add(new Path(temppfx+ff)); > } > dstFS.moveFromLocalFile(lop.toArray(new Path[]{}), dstPath); > } > > However, this throws an error java.io.IOException: > `hdfs://X:54310/tmp/testseq/_temporary/_attempt_200903160945_0010_m_000000_0': > specified destination directory doest not exist > > I though this is the right to place to drop side effect files. Prior > to this I was copying o the output folder, but many were not copied, > or in fact all were, but during the reduce output stage many were > deleted - am not sure(I used NullOutputFormat and all the files were > present in the output folder) So i resorted to getWorkOutputPath > which threw the above exception. > > So if I'm using FileOutputFormat, and my maps and/or reduces produce > side effects files on the localFS > 1)when should I copy them to the DFS (e.g the close method? or one at > a time in the map/reduce method) > 2) Where should i copy them to. > > I am using Hadoop 0.19 and have set jobConf.setNumTasksToExecutePerJvm(-1); > Also, each side effect file produced has a unique name, i.e there is > no overwriting. > You need not set jobConf.setNumTasksToExecutePerJvm(-1), even otherwise, each attempt will have unique work output path.
Thanks Amareshwari
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext