|
|
+
Niels Basjes 2011-12-24, 14:23
+
Anthony Urso 2011-12-26, 06:56
+
Niels Basjes 2011-12-27, 10:00
+
Koji Noguchi 2011-12-27, 11:07
-
Re: Gzip progress during map phase.Niels Basjes 2011-12-27, 11:42
Yes, this is what i was looking for.
Thanks -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 27 dec. 2011 12:08 schreef "Koji Noguchi" <[EMAIL PROTECTED]> het volgende: > Assuming you're using TextInputFormat, it sounds like > https://issues.apache.org/jira/browse/MAPREDUCE-773 > In 0.21. Don't know about CDH. > > Koji > > > On 12/27/11 2:00 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote: > > > I would not expect this. I would expect behaviour that is independent of > > the way the splits are created. > > > > -- > > Met vriendelijke groet, > > Niels Basjes > > (Verstuurd vanaf mobiel ) > > Op 26 dec. 2011 07:57 schreef "Anthony Urso" <[EMAIL PROTECTED]> het > > volgende: > > > >> Gzip files (unlike uncompressed files) are not splittable, which may be > >> causing the behavior that you described. > >> On Dec 24, 2011 6:24 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote: > >> > >>> Hi, > >>> > >>> I noticed that the mapper progress indication in the hadoop cdh3 > >>> distribution jumps from 0% to 100% for each gzipped input file. So when > >>> running with big gzipped input files the job appears to be stuck. > >>> > >>> I was unable to find a jira issue that describes this effect. > >>> Before I dive into this I have a few questions to you guys: > >>> 1) is this a known effect for the 0.20 version? If so what is the jira > >>> issue? > >>> 2) is this specific to gzip? > >>> 3) is this effect still present in the MRv2/yarn version of Hadoop? > >>> > >>> Thanks. > >>> -- > >>> Met vriendelijke groet, > >>> Niels Basjes > >>> (Verstuurd vanaf mobiel ) > >>> > >> > > |