|
|
-
Textoutputformat not outputting all keys in Hadoop 0.20?Saptarshi Guha 2009-09-05, 19:22
Hello,
I'm using the the textoutputformat in mapreduce/lib/output with Hadoop 0.20 and it appears it is not writing all the keys to the output file even though the the write method in the RecordWriter is recieving them. Let me explain 1) I copied TextOutputFormat save for some debugging print messages public synchronized void write(K key, V value) throws IOException { boolean nullKey = key == null || key instanceof NullWritable; boolean nullValue = value == null || value instanceof NullWritable; if (nullKey && nullValue) { return; } if (!nullKey) { writeObject(key); } if (!(nullKey || nullValue)) { out.write(keyValueSeparator); } if (!nullValue) { writeObject(value); } out.write(newline); System.out.println("Key="+key.toString()); System.out.println("Value="+value.toString()); } I expect 52 keys corresponding to the upper/lower case keys of the alphabet. I get < 52 keys in the output folder, sometimes 44, some times, and once even 52. /However/, the write method above does recieve the missing K,V value as evidenced by the log file messages, i.e i see Key=(missing key) and Value=(missing-value) Hence for some reason, a) it is not writing,b) writing but not flushing/commiting or c) the temporary outputs are getting deleted. Also if a given reducer has received e.g 5 keys, i see messages for 5 keys, of which a few (but not all) are missing. SequenceFileOutputFormat does not have the same issues(all 52 present) Any ideas?My bug? Kind Regards Saptarshi Version: 0.20.0, r763504 Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley Identifier: 200908281653 Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha Kindness is a language which the deaf can hear and the blind can read. -- Mark Twain |