Thank you Robert and Lohit for providing the info.
In my cause using Text input format am reading a line but emitting it two
On 17 Oct 2012 10:02, "lohit" <[EMAIL PROTECTED]> wrote:
> As Robert said, If you job is mainly IO intensive and CPU are idle, then
having lzo would improve your overal job performance.
> In your case it looks like the job you are running is not IO bound and
seems to take up CPU in compressing/decompressing the data.
> It also depends on the kind of data. Some dataset might not be
compressible (eg random data) , in those cases you would end up wasting CPU
cycles and it is better to turn off compression for such jobs.
> 2012/10/16 Robert Dyer <[EMAIL PROTECTED]>
>> Hi Manoj,
>> If the data is the same for both tests and the number of mappers is
>> fewer, then each mapper has more (uncompressed) data to process. Thus
>> each mapper should take longer and overall execution time should
>> As a simple example: if your data is 128MB uncompressed it may use 2
>> mappers, each processing 64MB of data (1 HDFS block per map task).
>> However, if you compress the data and it is now say 60MB, then one map
>> task will get the entire input file, decompress the data (to 128MB),
>> and process it.
>> On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu <[EMAIL PROTECTED]> wrote:
>> > Hi All,
>> > When using lzo compression the file size drastically reduced and the
>> > mappers is reduced but the overall execution time is increased, I
>> > that because mappers deals with same amount of data.
>> > Is this the expected behavior?
>> > Cheers!
>> > Manoj.
> Have a Nice Day!