Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Gzipped input files

Patrick Marchwiak 2010-10-08, 20:58
Copy link to this message
Re: Gzipped input files
It's done by the RecordReader. For text-based input formats, which use
LineRecordReader, decompression is carried out automatically. For
others it's not (e.g. sequence files which have internal compression).
So it depends on what your custom input format does.


On Fri, Oct 8, 2010 at 1:58 PM, Patrick Marchwiak <[EMAIL PROTECTED]> wrote:
> Hi,
> The Hadoop Definitive Guide book states that "if your input files are
> compressed, they will be automatically decompressed as they are read
> by MapReduce, using the filename extension to determine the codec to
> use" (in the section titled "Using Compression in MapReduce"). I'm
> trying to run a mapreduce job with some gzipped files as input and
> this isn't working. Does support for this have to be built into the
> input format? I'm using a custom one that extends from
> FileInputFormat. Is there an additional configuration option that
> should be set?  I'd like to avoid having to do decompression from
> within my map.
> I'm using the new API and the CDH3b2 distro.
> Thanks.
Patrick Marchwiak 2010-10-08, 23:57