Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Textoutputformat not outputting all keys in Hadoop 0.20?


Copy link to this message
-
Re: Textoutputformat not outputting all keys in Hadoop 0.20?
Saptarshi Guha 2009-09-07, 22:12
Hello,
The problem is rather odd. I installed the version you mentioned and
still have the same problem. My HBBytesWritable has a toString method.
Texouputformat calls this.

a) If my toString method outputs a the bytes (like the toString method
in BytesWritable), i do not have any skipped keys
b) if instead my toString calls an external function (given byte[]
return a string), though the
TextOutputFormat receives the bytes(as I mentioned before), it doesn't
get written to disk.

Not sure whether this my design fault or not
Regards
Saptarshi
On Mon, Sep 7, 2009 at 12:16 PM, Todd Lipcon<[EMAIL PROTECTED]> wrote:
> Hi Saptarshi,
>
> Are you able to reproduce this on the 0.20.1rc1 uploaded last week?
>
> http://people.apache.org/~omalley/hadoop-0.20.1-rc1/
>
> If so, it would be worth putting together a test case. If you can reproduce
> this in a JUnit test (even if it only happens once every few runs) you
> should definitely open a JIRA.
>
> Thanks,
> -Todd
>
> On Sat, Sep 5, 2009 at 12:22 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>> I'm using the the textoutputformat in mapreduce/lib/output with Hadoop 0.20
>> and it appears it is not writing all the keys to the output file even though
>> the
>> the write method in the RecordWriter is recieving them. Let me explain
>>
>> 1) I copied TextOutputFormat  save for some debugging print messages
>>
>>    public synchronized void write(K key, V value)
>>      throws IOException {
>>
>>      boolean nullKey = key == null || key instanceof NullWritable;
>>      boolean nullValue = value == null || value instanceof NullWritable;
>>      if (nullKey && nullValue) {
>>        return;
>>      }
>>      if (!nullKey) {
>>        writeObject(key);
>>      }
>>      if (!(nullKey || nullValue)) {
>>        out.write(keyValueSeparator);
>>      }
>>      if (!nullValue) {
>>        writeObject(value);
>>      }
>>      out.write(newline);
>>
>>            System.out.println("Key="+key.toString());
>>            System.out.println("Value="+value.toString());
>>    }
>>
>> I expect 52 keys corresponding to the upper/lower case keys of the
>> alphabet.  I get < 52 keys in the output folder, sometimes 44, some times,
>> and once even 52.
>> /However/, the write method above does recieve the missing K,V value as
>> evidenced by the log file messages, i.e i see Key=(missing key) and
>> Value=(missing-value)
>> Hence for some reason, a) it is not writing,b) writing but not
>> flushing/commiting or c) the temporary outputs are getting deleted.
>> Also if a given reducer has received  e.g 5 keys, i see messages for 5
>> keys, of which a few (but not all) are missing.
>>
>> SequenceFileOutputFormat does not have the same issues(all 52 present)
>>
>> Any ideas?My bug?
>> Kind Regards
>> Saptarshi
>>
>> Version: 0.20.0, r763504
>> Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley
>> Identifier: 200908281653
>>
>>
>>
>> Saptarshi Guha | [EMAIL PROTECTED] |
>> http://www.stat.purdue.edu/~sguha <http://www.stat.purdue.edu/%7Esguha>
>> Kindness is a language which the deaf can hear and the blind can read.
>>                -- Mark Twain
>>
>>
>