I would be interesting to see the jobs' statistics (counters).
On Fri, Sep 7, 2012 at 3:25 AM, Young-Geun Park
<[EMAIL PROTECTED]> wrote:
> Hi, All
> I have tested which method is better between Lzo and SequenceFile for a BIG
> File size is 10GiB and WordCount MR is used.
> Inputs of WordCount MR are lzo which would be indexed by LzoIndexTool(lzo),
> sequence file which is compressed by block level snappy(seq) , and
> uncompressed original file(none).
> Map output is compressed except of uncompressed file. mapreduce output is
> not compressed for all cases.
> The following are wordcount MR running time;
> none lzo seq
> 248s 243s 1410s
> -Test Environments
> OS : CentOS 5.6 (x64) (kernel = 2.6.18)
> # of Core : 8 (cpu = Intel(R) Xeon(R) CPU E5504 @ 2.00GHz)
> RAM : 18GB
> Java version : 1.6.0_26
> Hadoop version : CDH3U2
> # of datanode(tasktracker) : 8
> According to the result, The running time of SequnceFile is much less than
> the others.
> Before testing, I had expected that the results of both SequenceFile and
> Lzo are about the same.
> I want to know why performance of the sequence file compressed by snappy is
> so bad?
> do I miss anything in tests?