Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
   Author: Niels Basjes, 2013-05-19, 09:03
Re: How to process only input files containing 100% valid rows - MapReduce - [mail # user]
...How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, ...
   Author: Niels Basjes, 2013-04-19, 08:21
Re: how to find top N values using map-reduce ? - MapReduce - [mail # user]
...My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many ...
   Author: Niels Basjes, 2013-02-02, 12:44
Re: What is the preferred way to pass a small number of configuration parameters to a mapper or reducer - MapReduce - [mail # user]
...F. put a mongodb replica set on all hadoop workernodes and let the tasks query the mongodb at localhost.  (this is what I did recently with a multi GiB dataset)  Met vriendelijke g...
   Author: Niels Basjes, 2012-12-30, 19:38
Re: Doubts on compressed file - MapReduce - [mail # user]
...Hi,   Yes.   Yes, and then the mapper will read the other parts of the file over the network. So what I do is I upload such files with a bigger HDFS blocksize so the mapper has "th...
   Author: Niels Basjes, 2012-11-07, 12:47
[expand - 1 more] - Re: Hadoop Real time help - MapReduce - [mail # user]
...Thanks for the pointers, I have stuff to read now :)  On Mon, Aug 20, 2012 at 9:37 AM, Bertrand Dechoux  wrote:    Best regards / Met vriendelijke groeten,  Niels Ba...
   Author: Niels Basjes, 2012-08-22, 18:21
Re: output/input ratio > 1 for map tasks? - MapReduce - [mail # user]
...Hi,  On Mon, Jul 30, 2012 at 8:47 PM, brisk  wrote:  For a simple example: Have a look at the WordCount example.  Input of a single map call is 1 record: "This is a line"...
   Author: Niels Basjes, 2012-07-30, 20:15
Making gzip splittable for Hadoop - MapReduce - [mail # user]
...Hi,  In many Hadoop production environments you get gzipped files as the raw input. Usually these are Apache HTTPD logfiles. When putting these gzipped files into Hadoop you are stuck w...
   Author: Niels Basjes, 2012-03-30, 14:07
[expand - 3 more] - Re: Merge sorting reduce output files - MapReduce - [mail # user]
...Hi,  On Thu, Mar 1, 2012 at 00:07, Robert Evans  wrote:   No worries.    What we have has a lot more features. Yet the basic idea of what we have is similar enough t...
   Author: Niels Basjes, 2012-03-01, 14:23
Should splittable Gzip be a "core" hadoop feature? - MapReduce - [mail # user]
...Hi,  Some time ago I had an idea and implemented it.  Normally you can only run a single gzipped input file through a single mapper and thus only on a single CPU core. What I creat...
   Author: Niels Basjes, 2012-02-28, 15:50
