Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # dev - Gzip progress during map phase.


Copy link to this message
-
Re: Gzip progress during map phase.
Niels Basjes 2011-12-27, 10:00
I would not expect this. I would expect behaviour that is independent of
the way the splits are created.

--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 26 dec. 2011 07:57 schreef "Anthony Urso" <[EMAIL PROTECTED]> het
volgende:

> Gzip files (unlike uncompressed files) are not splittable, which may be
> causing the behavior that you described.
> On Dec 24, 2011 6:24 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I noticed that the mapper progress indication in the hadoop cdh3
> > distribution jumps from 0% to 100% for each gzipped input file. So when
> > running with big gzipped input files the job appears to be stuck.
> >
> > I was unable to find a jira issue that describes this effect.
> > Before I dive into this I have a few questions to you guys:
> > 1) is this a known effect for the 0.20 version? If so what is the jira
> > issue?
> > 2) is this specific to gzip?
> > 3) is this effect still present in the MRv2/yarn version of Hadoop?
> >
> > Thanks.
> > --
> > Met vriendelijke groet,
> > Niels Basjes
> > (Verstuurd vanaf mobiel )
> >
>