Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Textoutputformat not outputting all keys in Hadoop 0.20?


Copy link to this message
-
Re: Textoutputformat not outputting all keys in Hadoop 0.20?
Hello,
The problem is rather odd. I installed the version you mentioned and
still have the same problem. My HBBytesWritable has a toString method.
Texouputformat calls this.

a) If my toString method outputs a the bytes (like the toString method
in BytesWritable), i do not have any skipped keys
b) if instead my toString calls an external function (given byte[]
return a string), though the
TextOutputFormat receives the bytes(as I mentioned before), it doesn't
get written to disk.

Not sure whether this my design fault or not
Regards
Saptarshi
On Mon, Sep 7, 2009 at 12:16 PM, Todd Lipcon<[EMAIL PROTECTED]> wrote:
> Hi Saptarshi,
>
> Are you able to reproduce this on the 0.20.1rc1 uploaded last week?
>
> http://people.apache.org/~omalley/hadoop-0.20.1-rc1/
>
> If so, it would be worth putting together a test case. If you can reproduce
> this in a JUnit test (even if it only happens once every few runs) you
> should definitely open a JIRA.
>
> Thanks,
> -Todd
>
> On Sat, Sep 5, 2009 at 12:22 PM, Saptarshi Guha <[EMAIL PROTECTED]>wrote:
>
>> Hello,
>> I'm using the the textoutputformat in mapreduce/lib/output with Hadoop 0.20
>> and it appears it is not writing all the keys to the output file even though
>> the
>> the write method in the RecordWriter is recieving them. Let me explain
>>
>> 1) I copied TextOutputFormat  save for some debugging print messages
>>
>>    public synchronized void write(K key, V value)
>>      throws IOException {
>>
>>      boolean nullKey = key == null || key instanceof NullWritable;
>>      boolean nullValue = value == null || value instanceof NullWritable;
>>      if (nullKey && nullValue) {
>>        return;
>>      }
>>      if (!nullKey) {
>>        writeObject(key);
>>      }
>>      if (!(nullKey || nullValue)) {
>>        out.write(keyValueSeparator);
>>      }
>>      if (!nullValue) {
>>        writeObject(value);
>>      }
>>      out.write(newline);
>>
>>            System.out.println("Key="+key.toString());
>>            System.out.println("Value="+value.toString());
>>    }
>>
>> I expect 52 keys corresponding to the upper/lower case keys of the
>> alphabet.  I get < 52 keys in the output folder, sometimes 44, some times,
>> and once even 52.
>> /However/, the write method above does recieve the missing K,V value as
>> evidenced by the log file messages, i.e i see Key=(missing key) and
>> Value=(missing-value)
>> Hence for some reason, a) it is not writing,b) writing but not
>> flushing/commiting or c) the temporary outputs are getting deleted.
>> Also if a given reducer has received  e.g 5 keys, i see messages for 5
>> keys, of which a few (but not all) are missing.
>>
>> SequenceFileOutputFormat does not have the same issues(all 52 present)
>>
>> Any ideas?My bug?
>> Kind Regards
>> Saptarshi
>>
>> Version: 0.20.0, r763504
>> Compiled: Thu Apr 9 05:18:40 UTC 2009 by ndaley
>> Identifier: 200908281653
>>
>>
>>
>> Saptarshi Guha | [EMAIL PROTECTED] |
>> http://www.stat.purdue.edu/~sguha <http://www.stat.purdue.edu/%7Esguha>
>> Kindness is a language which the deaf can hear and the blind can read.
>>                -- Mark Twain
>>
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB