Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # user - Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

stan lee 2010-05-18, 15:44
Ted Yu 2010-05-18, 15:51
Harsh J 2010-05-18, 17:31
stan lee 2010-05-19, 07:17
stan lee 2010-05-19, 08:31
stan lee 2010-05-19, 10:38
Ranjit Mathew 2010-05-19, 10:59
Copy link to this message
Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo  compression and how can we use gzip compressoin codec in hadoop
Hong Tang 2010-05-18, 18:11

See my comments inline.

Thanks, Hong

On May 18, 2010, at 8:44 AM, stan lee wrote:

> Hi Guys,
> I am trying to use compression to reduce the IO workload when trying  
> to run
> a job but failed. I have several questions which needs your help.
> For lzo compression, I found a guide
> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it  
> said "Note
> that you must have both 32-bit and 64-bit liblzo2 installed" ? I am  
> not sure
> whether it means that we also need 32bit liblzo2 installed even when  
> we are
> on 64bit system. If so, why?

The answer on the wiki page is to the question of how to set up the  
native libraries so that both 32-bit AND 64-bit java would work. If  
you adhere to an environment with the same flavor of java across the  
whole cluster, then the solution would not apply to you.

> Also if I don't use lzo compression and tried to use gzip to  
> compress the
> final reduce output file, I just set below value in mapred-site.xml,  
> but
> seems it doesn't work(how can I find the final .gz file compressed?  
> I used
> "hadoop dfs -l <dir>" and didn't find that.). My question: can we  
> use gzip
> to compress the final result when it's not streaming job? How can we  
> ensure
> that the compression has been enabled during a job execution?
> <property>
>       <name>mapred.output.compress</name>
>       <value>true</value>
> </property>

The truth is, this option is honored by the implementation of  
OutputFormat classes.  If you use TextOutputFormat, then you should  
see files like "part-xxxx.gz" in the output directory. If you write  
your own output format class, then you should follow the  
implementations of TextOutputFormat or SequenceFileOutputFormat to set  
up compression properly.