Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Gzip progress during map phase.


Copy link to this message
-
Re: Gzip progress during map phase.
I would not expect this. I would expect behaviour that is independent of
the way the splits are created.

--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 26 dec. 2011 07:57 schreef "Anthony Urso" <[EMAIL PROTECTED]> het
volgende:

> Gzip files (unlike uncompressed files) are not splittable, which may be
> causing the behavior that you described.
> On Dec 24, 2011 6:24 AM, "Niels Basjes" <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> >
> > I noticed that the mapper progress indication in the hadoop cdh3
> > distribution jumps from 0% to 100% for each gzipped input file. So when
> > running with big gzipped input files the job appears to be stuck.
> >
> > I was unable to find a jira issue that describes this effect.
> > Before I dive into this I have a few questions to you guys:
> > 1) is this a known effect for the 0.20 version? If so what is the jira
> > issue?
> > 2) is this specific to gzip?
> > 3) is this effect still present in the MRv2/yarn version of Hadoop?
> >
> > Thanks.
> > --
> > Met vriendelijke groet,
> > Niels Basjes
> > (Verstuurd vanaf mobiel )
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB