Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # dev >> Gzip progress during map phase.


+
Niels Basjes 2011-12-24, 14:23
+
Anthony Urso 2011-12-26, 06:56
+
Niels Basjes 2011-12-27, 10:00
+
Koji Noguchi 2011-12-27, 11:07
Copy link to this message
-
Re: Gzip progress during map phase.
Yes, this is what i was looking for.
Thanks

--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 27 dec. 2011 12:08 schreef "Koji Noguchi" <[EMAIL PROTECTED]> het
volgende:

> Assuming you're using TextInputFormat, it sounds like
> https://issues.apache.org/jira/browse/MAPREDUCE-773
> In 0.21.  Don't know about CDH.
>
> Koji
>
>
> On 12/27/11 2:00 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote:
>
> > I would not expect this. I would expect behaviour that is independent of
> > the way the splits are created.
> >
> > --
> > Met vriendelijke groet,
> > Niels Basjes
> > (Verstuurd vanaf mobiel )
> > Op 26 dec. 2011 07:57 schreef "Anthony Urso" <[EMAIL PROTECTED]> het
> > volgende:
> >
> >> Gzip files (unlike uncompressed files) are not splittable, which may be
> >> causing the behavior that you described.
> >> On Dec 24, 2011 6:24 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Hi,
> >>>
> >>> I noticed that the mapper progress indication in the hadoop cdh3
> >>> distribution jumps from 0% to 100% for each gzipped input file. So when
> >>> running with big gzipped input files the job appears to be stuck.
> >>>
> >>> I was unable to find a jira issue that describes this effect.
> >>> Before I dive into this I have a few questions to you guys:
> >>> 1) is this a known effect for the 0.20 version? If so what is the jira
> >>> issue?
> >>> 2) is this specific to gzip?
> >>> 3) is this effect still present in the MRv2/yarn version of Hadoop?
> >>>
> >>> Thanks.
> >>> --
> >>> Met vriendelijke groet,
> >>> Niels Basjes
> >>> (Verstuurd vanaf mobiel )
> >>>
> >>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB