[MAPREDUCE-2094] org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. - MapReduce - [issue]
...When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 3...
http://issues.apache.org/jira/browse/MAPREDUCE-2094    Author: Niels Basjes, 2014-07-27, 21:49
[MAPREDUCE-5928] Deadlock allocating containers for mappers and reducers - MapReduce - [issue]
...I have a small cluster consisting of 8 desktop class systems (1 master + 7 workers).Due to the small memory of these systems I configured yarn as follows:yarn.nodemanager.resource.memory-mb ...
http://issues.apache.org/jira/browse/MAPREDUCE-5928    Author: Niels Basjes, 2014-06-18, 16:34
[MAPREDUCE-5925] NLineInputFormat silently produces garbage on gzipped input - MapReduce - [issue]
...[ Found while investigating the impact of MAPREDUCE-2094 ]The org.apache.hadoop.mapreduce.lib.input.NLineInputFormat (probably the mapred version too) only makes sense for splittable files.T...
http://issues.apache.org/jira/browse/MAPREDUCE-5925    Author: Niels Basjes, 2014-06-13, 10:44
Re: stop generating these "part-XXXX" empty files when using MultipleOutputs in mapreduce job - MapReduce - [mail # user]
...Use the LazyOutputFormat.  Have a look at this: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/output/LazyOutputFormat.html and http://stackoverflow.com/quest...
   Author: Niels Basjes, 2013-10-28, 19:31
Generating mysql or sqlite datafiles from Hadoop (Java)? - MapReduce - [mail # user]
...Hi,  I remember hearing a while ago that (if I remember correctly) Facebook had an outputformat that wrote the underlying MySQL database files directly from a MapReduce job.  For m...
   Author: Niels Basjes, 2013-09-17, 14:39
Re: Why LineRecordWriter.write(..) is synchronized - MapReduce - [mail # user]
...I expect the impact on the IO speed to be almost 0 because waiting for a single disk seek is longer than many thousands of calls to a synchronized method.  Niels On Aug 11, 2013 3:00 PM...
3 emails [+ more]    Author: Niels Basjes, 2013-08-11, 16:02
Re: Is there any way to use a hdfs file as a Circular buffer? - MapReduce - [mail # user]
...A circular file on hdfs is not possible.  Some of the ways around this limitation: - Create a series of files and delete the oldest file when you have too much. - Put the data into an h...
   Author: Niels Basjes, 2013-07-24, 22:22
Re: gz containing null chars? - MapReduce - [mail # user]
...My best guess is that at a low level a string is often terminated by having a null byte at the end. Perhaps that's where the difference lies. Perhaps the gz decompressor simply stops at the ...
   Author: Niels Basjes, 2013-06-10, 20:27
Re: Experimental Hadoop Cluster - Linux Windows machines - MapReduce - [mail # user]
...I've installed CentOS on several different types of old (originally Windows XP)  Dell desktops for the last 4 years (i.e. desktops as old as 7 years ago) and so far installing CentOS wa...
2 emails [+ more]    Author: Niels Basjes, 2013-06-01, 20:27
Re: Configuring SSH - is it required? for a psedo distriburted mode? - MapReduce - [mail # user]
...I never configure the ssh feature. Not for running on a single node and not for a full size cluster. I simply start all the required deamons (name/data/job/task) and configure them on which ...
   Author: Niels Basjes, 2013-05-19, 09:03
