I stopped a job that was running very slowly, it was running in it's reduce (phase:reduce) part. However, I still want it's output and I cannot run this job again. So I have to stick with the intermediate files.
I have a 30GB file map_0.out (found in reducer jobcache) and I want to read it's contents using an InputFormat. It's not a SequenceFile as I already tried that out. How do I read this file? I presume it's some sort of sorted map of Writable key with corresponding Writable values. (After all, this file was being used directly for the reducer function).
Thanks. I succesfully created an InputFormat that uses an IFile.Reader. The fact that the files are concatenated did not seem to matter much, I could use a single IFile.Reader to read the entire map_0.out file.
Owen O'Malley wrote: > The intermediate files are called IFiles. The format is trivial and > you can read the code to see it. The only tricky bit is that you > effectively have N IFiles concatenated together (one per a reduce). > > -- Owen
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext