[MAPREDUCE-1820] InputSampler does not create a deep copy of the key object when creating a sample, which causes problems with some formats like SequenceFile<Text,Text> - MapReduce - [issue]
...I tried to use the InputSampler on a SequenceFile<Text,Text> and found that it comes up with duplicate keys in the sample.  The problem was tracked down to the fact that the Text ...
http://issues.apache.org/jira/browse/MAPREDUCE-1820    Author: Alex Kozlov, 2013-05-08, 22:13
Re: How can I reduce the number of nodes used by a job - MapReduce - [mail # user]
...Hi Steve, there is no simple way to just limit the number of nodes as it would involve moving the data:  You want to have the 3 replicas on the 5,10,20 nodes, correct?  You could p...
   Author: Alex Kozlov, 2011-12-16, 00:08
Re: What is the cost of using a counter - MapReduce - [mail # user]
...Hi Steve, One thing to keep in mind is that the counters in Hadoop are passed via heartbeats, so you'll see updates only 2 seconds or so.  I have seen implementations with 1,000s of cou...
   Author: Alex Kozlov, 2011-12-08, 19:21
Re: When is mapred-site.xml read? - MapReduce - [mail # user]
...*keep.failed.task.files* is also set by the client (also, HDFS block size, replication level, *io.sort.{mb,factor}*, etc.)  On Tue, Jun 21, 2011 at 7:15 AM, John Armstrong wrote:  ...
   Author: Alex Kozlov, 2011-06-21, 14:29
Re: mapred.child.java.opts question - MapReduce - [mail # user]
...There might be different reasons why this parameter is not passed to the slave JVM: for example, it might have been declared final.  Do you see the correct parameter in your job xml fil...
   Author: Alex Kozlov, 2011-06-14, 15:47
Re: 2011-06-10 13:14:29,767 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to server1/ : Address already in use - MapReduce - [mail # user]
...Can you be more explicit for the benefit of others?  How can a history folder result in the "Address already in use" exception?  On Fri, Jun 10, 2011 at 2:33 PM, Shuja Rehman  ...
   Author: Alex Kozlov, 2011-06-10, 21:42
Re: Hadoop Mapreduce jobs and LD_LIBRARY_PATH - MapReduce - [mail # user]
...In the "standalone" hadoop application, try setting `export HADOOP_OPTS="-Djava.library.path=..."` w/o explicitly setting LD_LIBRARY_PATH.  On Mon, May 2, 2011 at 8:07 AM, Alex Kozlov &...
   Author: Alex Kozlov, 2011-05-02, 15:14
[MAPREDUCE-1808] Have a configurable metric reporting CPU/disk usage per user - MapReduce - [issue]
...Many organizations are looking at resource usage per department/group/user for diagnostic and resource allocation purposes.  It should be straightforward to implement a metric showing t...
http://issues.apache.org/jira/browse/MAPREDUCE-1808    Author: Alex Kozlov, 2011-03-07, 08:02
Re: Is it possible to determine the source of a value in the Mapper? - MapReduce - [mail # user]
...There is a way to get the file name in the new mapreduce API:  fileName = ((FileSplit) context.getInputSplit()).getPath().toString();  You usually do it in the setup() method. &nbs...
   Author: Alex Kozlov, 2011-02-16, 21:37
Re: -libjars? - MapReduce - [mail # user]
...Try using "-files HStats-1A18.jar" as well: it will put it into Distributed Cache on the HDFS cluster...  -- Alex K  On Fri, Dec 10, 2010 at 10:08 AM, Todd Lipcon  wrote: &nbs...
   Author: Alex Kozlov, 2010-12-11, 23:04
