Re: Why is single reducer called twice? - Hadoop - [mail # user]
...Thank you, worked like a charm  On Tue, Jul 28, 2009 at 12:33 AM, Ted Dunning  wrote:  ...
   Author: Mark Kerzner, 2009-07-28, 16:52
Using bytes or strings as keys? - Hadoop - [mail # user]
...Hi, I want to use the hash (sha-1) for the key. Should I use the binary representation in bytes, or the ascii representation, as given by the sha1sum utility in Linux? Which is faster, and a...
   Author: Mark Kerzner, 2009-07-26, 23:51
Breaking up maps into separate files? - Hadoop - [mail # user]
...Hi, in the Reducer, I take each map and break its value into three pieces: binary piece, text piece, and a descriptor. I want to collect the binary pieces all in one output zip file, the tex...
   Author: Mark Kerzner, 2009-07-24, 20:35
Re: Use text vs binary output for speed? - Hadoop - [mail # user]
...Maybe it was slow for me because I was writing from file system to HDFS, but now that I am using Amazon's MR, it will be OK. Thank you, Mark  On Fri, Jul 24, 2009 at 3:19 PM, Owen O'Mal...
   Author: Mark Kerzner, 2009-07-24, 20:26
Re: zip files as input - Hadoop - [mail # user]
...Thanks, Kris, reading it now! Mark  On Tue, Jul 7, 2009 at 12:07 PM, Kris Jirapinyo wrote:  ...
   Author: Mark Kerzner, 2009-07-07, 17:17
Re: Parallell maps - Hadoop - [mail # user]
...That's awesome information, Marcus. I am working on a project which would require a similar architectural solution (although unlike you I can't broadcast the details), so that was very usefu...
   Author: Mark Kerzner, 2009-07-02, 20:34
Re: grahical tool for hadoop mapreduce - Hadoop - [mail # user]
...Tom, this is so much right on time! Bravo, Karmasphere. I installed the plugins, and nothing crashed - in fact, I get the same screens as the manual promises.  It is worth reading this ...
   Author: Mark Kerzner, 2009-06-26, 15:55
Put computation in Map or in Reduce - Hadoop - [mail # user]
...Hi,  in an MR step, I need to extract text from various files (using Tika). I have put text extraction into reduce(), because I am writing the extracted text to the output on HDFS. But ...
   Author: Mark Kerzner, 2009-04-21, 03:25
Re: Performance question - Hadoop - [mail # user]
...Arun, thank you very much for the answer. I will turn off the combiner. I a m debugging intermediate MR steps now, so I am mostly interested in performance to for this, and real tuning will ...
   Author: Mark Kerzner, 2009-04-20, 15:37
Re: Broder or other near-duplicate algorithms? - Hadoop - [mail # user]
...Yi-Kai, that's good to know - and I have read this article - but is your code available?  Thank you, Mark  On Tue, Mar 24, 2009 at 9:51 AM, Yi-Kai Tsai  wrote:  ...
   Author: Mark Kerzner, 2009-03-24, 18:23
