Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Lzo vs SequenceFile for big file


Copy link to this message
-
Re: Lzo vs SequenceFile for big file
Hi,

I would be interesting to see the jobs' statistics (counters).

Thanks

On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park
<[EMAIL PROTECTED]> wrote:
> Hi, All
>
> I have tested which method is better between Lzo and SequenceFile for a BIG
> file.
>
> File size is 10GiB and WordCount MR is used.
> Inputs of WordCount MR are  lzo which would be indexed by LzoIndexTool(lzo),
> sequence file which is compressed by block level snappy(seq)  , and
> uncompressed original file(none).
>
> Map output  is compressed except of uncompressed file. mapreduce output is
> not compressed for all cases.
>
> The following are wordcount MR running time;
> none       lzo         seq
> 248s      243s     1410s
>
> -Test Environments
>
> OS : CentOS 5.6 (x64) (kernel = 2.6.18)
> # of Core  : 8 (cpu = Intel(R) Xeon(R) CPU E5504  @ 2.00GHz)
> RAM : 18GB
> Java version : 1.6.0_26
> Hadoop version : CDH3U2
> # of datanode(tasktracker) :  8
>
> According to the result, The running time of SequnceFile is much less than
> the others.
> Before testing, I had expected that the results of  both SequenceFile and
> Lzo are about the same.
>
> I want to know why performance of the sequence file compressed by snappy is
> so bad?
>
> do I miss anything in tests?
>
>
> Regards,
> Park
>
>

--
Best Regards,
Ruslan Al-Fakikh