How do _you_ document your hadoop jobs? - MapReduce - [mail # user]
...We've taken to documenting our Hadoop jobs in a simple visual manner using PPT (attached example). I wonder how others document their jobs?     We often add notes to the text secti...
   Author: David Parks, 2013-02-25, 09:11
RE: How can I limit reducers to one-per-node? - MapReduce - [mail # user]
...Looking at the Job File for my job I see that this property is set to 1, however I have 3 reducers per node (I’m not clear what configuration is causing this behavior).     My prob...
   Author: David Parks, 2013-02-09, 04:46
RE: Tricks to upgrading Sequence Files? - MapReduce - [mail # user]
...I'll consider a patch to the SequenceFile, if we could manually override the sequence file input Key and Value that's read from the sequence file headers we'd have a clean solution.  I ...
   Author: David Parks, 2013-01-30, 02:17
Symbolic links available in 1.0.3? - MapReduce - [mail # user]
...Is it possible to use symbolic links in 1.0.3?     If yes: can I use symbolic links to create a single, final directory structure of files from many locations; then use DistCp/S3Di...
   Author: David Parks, 2013-01-29, 03:31
RE: Fastest way to transfer files - MapReduce - [mail # user]
...Here’s an example of running distcp (actually in this case s3distcp, but it’s about the same, just new DistCp()) from java:     ToolRunner.run(getConf(), new S3DistCp(), new String...
   Author: David Parks, 2012-12-29, 10:29
What does mapred.map.tasksperslot do? - MapReduce - [mail # user]
...I didn't come up with much in a google search.     In particular, what are the side effects of changing this setting? Memory? Sort process?     I'm guessing it means that...
   Author: David Parks, 2012-12-27, 08:21
How to troubleshoot OutOfMemoryError - MapReduce - [mail # user]
...I'm pretty consistently seeing a few reduce tasks fail with OutOfMemoryError (below). It doesn't kill the job, but it slows it down.     In my current case the reducer is pretty da...
   Author: David Parks, 2012-12-22, 04:33
OutOfMemory in ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory - MapReduce - [mail # user]
...I've got 15 boxes in a cluster, 7.5GB of ram each on AWS (m1.large), 1 reducer per node.     I'm seeing this exception sometimes. It's not stopping the job from completing, it's ju...
   Author: David Parks, 2012-12-17, 05:36
How to submit Tool jobs programatically in parallel? - MapReduce - [mail # user]
...I'm submitting unrelated jobs programmatically (using AWS EMR) so they run in parallel.  I'd like to run an s3distcp job in parallel as well, but the interface to that job is a Tool, e....
   Author: David Parks, 2012-12-14, 04:39
RE: Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException - MapReduce - [mail # user]
...If anyone follows this thread in the future, it turns out that I was being lead astray by these errors, they weren't the cause of the problem. This was the resolution:  http://stackover...
   Author: David Parks, 2012-12-14, 04:25
