-Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop
See my comments inline.
On May 18, 2010, at 8:44 AM, stan lee wrote:
> Hi Guys,
> I am trying to use compression to reduce the IO workload when trying
> to run
> a job but failed. I have several questions which needs your help.
> For lzo compression, I found a guide
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it
> said "Note
> that you must have both 32-bit and 64-bit liblzo2 installed" ? I am
> not sure
> whether it means that we also need 32bit liblzo2 installed even when
> we are
> on 64bit system. If so, why?
The answer on the wiki page is to the question of how to set up the
native libraries so that both 32-bit AND 64-bit java would work. If
you adhere to an environment with the same flavor of java across the
whole cluster, then the solution would not apply to you.
> Also if I don't use lzo compression and tried to use gzip to
> compress the
> final reduce output file, I just set below value in mapred-site.xml,
> seems it doesn't work（how can I find the final .gz file compressed?
> I used
> "hadoop dfs -l <dir>" and didn't find that.）. My question: can we
> use gzip
> to compress the final result when it's not streaming job? How can we
> that the compression has been enabled during a job execution?
The truth is, this option is honored by the implementation of
OutputFormat classes. If you use TextOutputFormat, then you should
see files like "part-xxxx.gz" in the output directory. If you write
your own output format class, then you should follow the
implementations of TextOutputFormat or SequenceFileOutputFormat to set
up compression properly.