Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - Reg LZO compression


Copy link to this message
-
Re: Reg LZO compression
Manoj Babu 2012-10-18, 17:33
Thank you Robert and Lohit for providing the info.

In my cause using Text input format am reading a line but emitting it two
times.
On 17 Oct 2012 10:02, "lohit" <[EMAIL PROTECTED]> wrote:
>
> As Robert said, If you job is mainly IO intensive and CPU are idle, then
having lzo would improve your overal job performance.
> In your case it looks like the job you are running is not IO bound and
seems to take up CPU in compressing/decompressing the data.
> It also depends on the kind of data. Some dataset might not be
compressible (eg random data) , in those cases you would end up wasting CPU
cycles and it is better to turn off compression for such jobs.
>
>
> 2012/10/16 Robert Dyer <[EMAIL PROTECTED]>
>>
>> Hi Manoj,
>>
>> If the data is the same for both tests and the number of mappers is
>> fewer, then each mapper has more (uncompressed) data to process.  Thus
>> each mapper should take longer and overall execution time should
>> increase.
>>
>> As a simple example: if your data is 128MB uncompressed it may use 2
>> mappers, each processing 64MB of data (1 HDFS block per map task).
>> However, if you compress the data and it is now say 60MB, then one map
>> task will get the entire input file, decompress the data (to 128MB),
>> and process it.
>>
>> On Tue, Oct 16, 2012 at 9:27 PM, Manoj Babu <[EMAIL PROTECTED]> wrote:
>> > Hi All,
>> >
>> > When using lzo compression the file size drastically reduced and the
no of
>> > mappers is reduced but the overall execution time is increased, I
assume
>> > that because mappers deals with same amount of data.
>> >
>> > Is this the expected behavior?
>> >
>> > Cheers!
>> > Manoj.
>> >
>
>
>
>
> --
> Have a Nice Day!
> Lohit