Patrick Marchwiak 2010-10-08, 20:58
It's done by the RecordReader. For text-based input formats, which use
LineRecordReader, decompression is carried out automatically. For
others it's not (e.g. sequence files which have internal compression).
So it depends on what your custom input format does.
On Fri, Oct 8, 2010 at 1:58 PM, Patrick Marchwiak <[EMAIL PROTECTED]> wrote:
> The Hadoop Definitive Guide book states that "if your input files are
> compressed, they will be automatically decompressed as they are read
> by MapReduce, using the filename extension to determine the codec to
> use" (in the section titled "Using Compression in MapReduce"). I'm
> trying to run a mapreduce job with some gzipped files as input and
> this isn't working. Does support for this have to be built into the
> input format? I'm using a custom one that extends from
> FileInputFormat. Is there an additional configuration option that
> should be set? I'd like to avoid having to do decompression from
> within my map.
> I'm using the new API and the CDH3b2 distro.
Patrick Marchwiak 2010-10-08, 23:57