|
|
-
reduce output compression of Terasort
Juwei Shi 2012-02-17, 06:37
Hi,
I am benchmarking the cluster using the Terasort package of Hadoop 0.20.2. I enabled compression for both map output (*mapred.compress.map.output*) and reduce output (*mapred.output.compress*). I checked the parameter in Job.xml, both are true. I can see that the compression for Map output works, but it seems that the compression for reduce output does not work. The output of the job on HDFS is also 1TB.
Thanks!
- Juwei
-
Re: reduce output compression of Terasort
Bejoy Ks 2012-02-17, 09:18
Hi Juwei What is the value for mapred.output.compression.codec? It'd be better to determine whether the output files are compressed by getting the codec of the same and not just from the size of files.
Regards Bejoy.K.S
On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi <[EMAIL PROTECTED]> wrote:
> Hi, > > I am benchmarking the cluster using the Terasort package of Hadoop 0.20.2. > I enabled compression for both map output (*mapred.compress.map.output*) > and reduce output (*mapred.output.compress*). I checked the parameter in > Job.xml, both are true. I can see that the compression for Map output > works, but it seems that the compression for reduce output does not work. > The output of the job on HDFS is also 1TB. > > Thanks! > > - Juwei >
-
Re: reduce output compression of Terasort
Juwei Shi 2012-02-17, 09:48
We use LZO, so the value is mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec
No compressed file in HDFS.
2012/2/17 Bejoy Ks <[EMAIL PROTECTED]>
> Hi Juwei > What is the value for mapred.output.compression.codec? It'd be > better to determine whether the output files are compressed by getting the > codec of the same and not just from the size of files. > > Regards > Bejoy.K.S > > > On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I am benchmarking the cluster using the Terasort package of Hadoop >> 0.20.2. I enabled compression for both map output (* >> mapred.compress.map.output*) and reduce output (*mapred.output.compress*). >> I checked the parameter in Job.xml, both are true. I can see that the >> compression for Map output works, but it seems that the compression for >> reduce output does not work. The output of the job on HDFS is also 1TB. >> >> Thanks! >> >> - Juwei >> > > -- - Juwei Shi (史巨伟)
-
Re: reduce output compression of Terasort
bejoy.hadoop@... 2012-02-17, 10:16
Juwei Is there any error messages on your TaskTracker logs related to compression like 'Codec not found' or so ?
Regards Bejoy K S
From handheld, Please excuse typos.
-----Original Message----- From: Juwei Shi <[EMAIL PROTECTED]> Date: Fri, 17 Feb 2012 17:48:08 To: <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] Subject: Re: reduce output compression of Terasort
We use LZO, so the value is mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec
No compressed file in HDFS.
2012/2/17 Bejoy Ks <[EMAIL PROTECTED]>
> Hi Juwei > What is the value for mapred.output.compression.codec? It'd be > better to determine whether the output files are compressed by getting the > codec of the same and not just from the size of files. > > Regards > Bejoy.K.S > > > On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I am benchmarking the cluster using the Terasort package of Hadoop >> 0.20.2. I enabled compression for both map output (* >> mapred.compress.map.output*) and reduce output (*mapred.output.compress*). >> I checked the parameter in Job.xml, both are true. I can see that the >> compression for Map output works, but it seems that the compression for >> reduce output does not work. The output of the job on HDFS is also 1TB. >> >> Thanks! >> >> - Juwei >> > > -- - Juwei Shi (史巨伟)
-
Re: reduce output compression of Terasort
Binglin Chang 2012-02-17, 12:05
As far as I know, TeraOutputFormat don't support compression
On Fri, Feb 17, 2012 at 6:16 PM, <[EMAIL PROTECTED]> wrote:
> ** > Juwei > Is there any error messages on your TaskTracker logs related to > compression like 'Codec not found' or so ? > Regards > Bejoy K S > > From handheld, Please excuse typos. > ------------------------------ > *From: * Juwei Shi <[EMAIL PROTECTED]> > *Date: *Fri, 17 Feb 2012 17:48:08 +0800 > *To: *<[EMAIL PROTECTED]> > *ReplyTo: * [EMAIL PROTECTED] > *Subject: *Re: reduce output compression of Terasort > > We use LZO, so the value is > mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec > > No compressed file in HDFS. > > > > 2012/2/17 Bejoy Ks <[EMAIL PROTECTED]> > >> Hi Juwei >> What is the value for mapred.output.compression.codec? It'd be >> better to determine whether the output files are compressed by getting the >> codec of the same and not just from the size of files. >> >> Regards >> Bejoy.K.S >> >> >> On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> I am benchmarking the cluster using the Terasort package of Hadoop >>> 0.20.2. I enabled compression for both map output (* >>> mapred.compress.map.output*) and reduce output (*mapred.output.compress*). >>> I checked the parameter in Job.xml, both are true. I can see that the >>> compression for Map output works, but it seems that the compression for >>> reduce output does not work. The output of the job on HDFS is also 1TB. >>> >>> Thanks! >>> >>> - Juwei >>> >> >> > > > -- > - Juwei Shi (史巨伟) >
-
Re: reduce output compression of Terasort
Juwei Shi 2012-02-17, 15:21
Binglin,
Thanks a lot for the info, I will check the format.
2012/2/17 Binglin Chang <[EMAIL PROTECTED]>
> As far as I know, TeraOutputFormat don't support compression > > > On Fri, Feb 17, 2012 at 6:16 PM, <[EMAIL PROTECTED]> wrote: > >> ** >> Juwei >> Is there any error messages on your TaskTracker logs related to >> compression like 'Codec not found' or so ? >> Regards >> Bejoy K S >> >> From handheld, Please excuse typos. >> ------------------------------ >> *From: * Juwei Shi <[EMAIL PROTECTED]> >> *Date: *Fri, 17 Feb 2012 17:48:08 +0800 >> *To: *<[EMAIL PROTECTED]> >> *ReplyTo: * [EMAIL PROTECTED] >> *Subject: *Re: reduce output compression of Terasort >> >> We use LZO, so the value is >> mapred.output.compression.codec = com.hadoop.compression.lzo.LzoCodec >> >> No compressed file in HDFS. >> >> >> >> 2012/2/17 Bejoy Ks <[EMAIL PROTECTED]> >> >>> Hi Juwei >>> What is the value for mapred.output.compression.codec? It'd be >>> better to determine whether the output files are compressed by getting the >>> codec of the same and not just from the size of files. >>> >>> Regards >>> Bejoy.K.S >>> >>> >>> On Fri, Feb 17, 2012 at 12:07 PM, Juwei Shi <[EMAIL PROTECTED]> wrote: >>> >>>> Hi, >>>> >>>> I am benchmarking the cluster using the Terasort package of Hadoop >>>> 0.20.2. I enabled compression for both map output (* >>>> mapred.compress.map.output*) and reduce output (*mapred.output.compress >>>> *). I checked the parameter in Job.xml, both are true. I can see that >>>> the compression for Map output works, but it seems that the compression for >>>> reduce output does not work. The output of the job on HDFS is also 1TB. >>>> >>>> Thanks! >>>> >>>> - Juwei >>>> >>> >>> >> >> >>
|
|