zheyi rong 2013-03-27, 10:38
While using System.out inside a Mapper or Reducer is fine as an aid to
learning, be careful: accidentally leaving them in (or not moving to
something like log4J) and running the job for real can mean writing
millions of lines of log output on a tasktracker, filling up disks and
making jobs needlessly slow.
On 27 March 2013 10:38, zheyi rong <[EMAIL PROTECTED]> wrote:
> Depends on your need. If you would like an overall statistics, for
> example, the number of the malformed records in your datasets,
> use counters. If you just want to know what is going on inside a mapper or
> reducer, use System.out.println;
> since mappers do not know each other, you cannot get an overall statistics
> of your job by using System.out.println().
> The output of System.out.println() will finally appear in the tasklog.
> In a distributed environment, mappers do not know each other. Imagine that
> mapper A is running on a machine, and mapper B is running on another
> machine, so in mapper A, you cannot get the internal state of mapper B
> simply by System.out.println().
> Harsh J answered it.
> 2013/3/27 Sai Sai <[EMAIL PROTECTED]>
>> Q1. Is it right to assume the System.out.println statements are used only
>> in eclipse environment and
>> In a multi node cluster environment we need to use counters.
>> Q2. I am slightly confused as it appears like using System.out.println
>> we r able to get detailed info at every line of code in eclipse and
>> counters just give few lines and not as detailed as System.out.println
>> statements do so what should we do in a multi node cluster enivronment.
>> Q3. Also when they say the limit of counters is 120 does that mean that
>> in the output if we use:
>> more than 120 times it will not print it. or does it refer to 120 options
>> of counters in an enum that we can define.
>> Any help is really appreciated.
Harsh J 2013-03-27, 10:27