-Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop
Hong Tang 2010-05-18, 18:11
See my comments inline.
On May 18, 2010, at 8:44 AM, stan lee wrote:
> Hi Guys,
> I am trying to use compression to reduce the IO workload when trying
> to run
> a job but failed. I have several questions which needs your help.
> For lzo compression, I found a guide
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it
> said "Note
> that you must have both 32-bit and 64-bit liblzo2 installed" ? I am
> not sure
> whether it means that we also need 32bit liblzo2 installed even when
> we are
> on 64bit system. If so, why?
The answer on the wiki page is to the question of how to set up the
native libraries so that both 32-bit AND 64-bit java would work. If
you adhere to an environment with the same flavor of java across the
whole cluster, then the solution would not apply to you.
> Also if I don't use lzo compression and tried to use gzip to
> compress the
> final reduce output file, I just set below value in mapred-site.xml,
> seems it doesn't work（how can I find the final .gz file compressed?
> I used
> "hadoop dfs -l <dir>" and didn't find that.）. My question: can we
> use gzip
> to compress the final result when it's not streaming job? How can we
> that the compression has been enabled during a job execution?
The truth is, this option is honored by the implementation of
OutputFormat classes. If you use TextOutputFormat, then you should
see files like "part-xxxx.gz" in the output directory. If you write
your own output format class, then you should follow the
implementations of TextOutputFormat or SequenceFileOutputFormat to set
up compression properly.