Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Textoutputformat not outputting all keys in Hadoop 0.20?

Copy link to this message
Textoutputformat not outputting all keys in Hadoop 0.20?
I'm using the the textoutputformat in mapreduce/lib/output with Hadoop  
0.20 and it appears it is not writing all the keys to the output file  
even though the
the write method in the RecordWriter is recieving them. Let me explain

1) I copied TextOutputFormat  save for some debugging print messages

     public synchronized void write(K key, V value)
       throws IOException {

       boolean nullKey = key == null || key instanceof NullWritable;
       boolean nullValue = value == null || value instanceof  
       if (nullKey && nullValue) {
       if (!nullKey) {
       if (!(nullKey || nullValue)) {
       if (!nullValue) {


I expect 52 keys corresponding to the upper/lower case keys of the  
alphabet.  I get < 52 keys in the output folder, sometimes 44, some  
times, and once even 52.
/However/, the write method above does recieve the missing K,V value  
as evidenced by the log file messages, i.e i see Key=(missing key) and  
Hence for some reason, a) it is not writing,b) writing but not  
flushing/commiting or c) the temporary outputs are getting deleted.
Also if a given reducer has received  e.g 5 keys, i see messages for 5  
keys, of which a few (but not all) are missing.

SequenceFileOutputFormat does not have the same issues(all 52 present)

Any ideas?My bug?
Kind Regards

Version: 0.20.0, r763504
Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley
Identifier: 200908281653

Saptarshi Guha | [EMAIL PROTECTED] | http://www.stat.purdue.edu/~sguha
Kindness is a language which the deaf can hear and the blind can read.
-- Mark Twain