Niels Basjes 2011-12-24, 14:23
Anthony Urso 2011-12-26, 06:56
Niels Basjes 2011-12-27, 10:00
Koji Noguchi 2011-12-27, 11:07
Yes, this is what i was looking for.
Met vriendelijke groet,
(Verstuurd vanaf mobiel )
Op 27 dec. 2011 12:08 schreef "Koji Noguchi" <[EMAIL PROTECTED]> het
> Assuming you're using TextInputFormat, it sounds like
> In 0.21. Don't know about CDH.
> On 12/27/11 2:00 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote:
> > I would not expect this. I would expect behaviour that is independent of
> > the way the splits are created.
> > --
> > Met vriendelijke groet,
> > Niels Basjes
> > (Verstuurd vanaf mobiel )
> > Op 26 dec. 2011 07:57 schreef "Anthony Urso" <[EMAIL PROTECTED]> het
> > volgende:
> >> Gzip files (unlike uncompressed files) are not splittable, which may be
> >> causing the behavior that you described.
> >> On Dec 24, 2011 6:24 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote:
> >>> Hi,
> >>> I noticed that the mapper progress indication in the hadoop cdh3
> >>> distribution jumps from 0% to 100% for each gzipped input file. So when
> >>> running with big gzipped input files the job appears to be stuck.
> >>> I was unable to find a jira issue that describes this effect.
> >>> Before I dive into this I have a few questions to you guys:
> >>> 1) is this a known effect for the 0.20 version? If so what is the jira
> >>> issue?
> >>> 2) is this specific to gzip?
> >>> 3) is this effect still present in the MRv2/yarn version of Hadoop?
> >>> Thanks.
> >>> --
> >>> Met vriendelijke groet,
> >>> Niels Basjes
> >>> (Verstuurd vanaf mobiel )