Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Lzo vs SequenceFile for big file


Copy link to this message
-
Re: Lzo vs SequenceFile for big file
Hi,

I would be interesting to see the jobs' statistics (counters).

Thanks

On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park
<[EMAIL PROTECTED]> wrote:
> Hi, All
>
> I have tested which method is better between Lzo and SequenceFile for a BIG
> file.
>
> File size is 10GiB and WordCount MR is used.
> Inputs of WordCount MR are  lzo which would be indexed by LzoIndexTool(lzo),
> sequence file which is compressed by block level snappy(seq)  , and
> uncompressed original file(none).
>
> Map output  is compressed except of uncompressed file. mapreduce output is
> not compressed for all cases.
>
> The following are wordcount MR running time;
> none       lzo         seq
> 248s      243s     1410s
>
> -Test Environments
>
> OS : CentOS 5.6 (x64) (kernel = 2.6.18)
> # of Core  : 8 (cpu = Intel(R) Xeon(R) CPU E5504  @ 2.00GHz)
> RAM : 18GB
> Java version : 1.6.0_26
> Hadoop version : CDH3U2
> # of datanode(tasktracker) :  8
>
> According to the result, The running time of SequnceFile is much less than
> the others.
> Before testing, I had expected that the results of  both SequenceFile and
> Lzo are about the same.
>
> I want to know why performance of the sequence file compressed by snappy is
> so bad?
>
> do I miss anything in tests?
>
>
> Regards,
> Park
>
>

--
Best Regards,
Ruslan Al-Fakikh
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB