Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # user - Re: M/R, Strange behavior with multiple Gzip files


Copy link to this message
-
Re: M/R, Strange behavior with multiple Gzip files
x6i4uybz labs 2012-12-06, 14:40
Hello,

The job isn't running in local mode. In fact, I think I have just a problem
with the map task progression.
The counters of each map task are OK during the job execution whereas the
progression of each map task stays at 0%.

On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari <[EMAIL PROTECTED]
> wrote:

> Hi,
>
> Have you configured the mapredsite.xml to tell where the job tracker
> is? If not, your job is running on the local jobtracker, running the
> tasks one by one.
>
> JM
>
> PS: I faced the same issue few weeks ago and got the exact same
> behaviour. This (above) solved the issue.
>
> 2012/12/6, x6i4uybz labs <[EMAIL PROTECTED]>:
> > Sorry,
> >
> > I wrote a job M/R to process several gz files (about 2000). I've a 80 map
> > slots cluster
> > JT instantiates one map per gz file (not splittable, it's OK).
> >
> > The first 80 maps spawn. But after "initializing" state,  it seems there
> is
> > one map running. And when this map is finished, another one started (not
> 80
> > maps in parallel) and another is affected to the empty slot.
> >
> > I've also noticed, the first maps last more than one hour and the last
> maps
> > 50 seconds.
> > Each gz file is between 10mb and 100mb.
> >
> > I don't understand the behavior.
> > I will launch again the job to see if I've the same issue.
> >
> > thanks, gpo
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >
> >> Your problem isn't clear in your description - can you please
> >> rephrase/redefine in terms of what you are expecting vs. what you are
> >> observing.
> >>
> >> Also note that Gzip files are not splittable by nature of their codec
> >> algorithm, and hence a TextInputFormat over plain/regular Gzip files
> >> would end up spawning and/or processing one whole Gzip file via one
> >> mapper, instead of multiple mappers per file.
> >>
> >> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs <[EMAIL PROTECTED]
> >
> >> wrote:
> >> > Hi everybody,
> >> >
> >> > I have a M/R job which does a bulk import to hbase.
> >> > I have to process many gzip files (2800 x ~ 100mb)
> >> >
> >> > I don't understand why my job instanciates 80 maps but runs each map
> >> > sequentialy like if there is only one big gz file.
> >> >
> >> > Is there a problem in my driver ? Or maybe I miss something.
> >> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
> >> args[0]
> >> > is a directory.
> >> >
> >> > Can you help me, please ?
> >> >
> >> > Thanks, Guillaume
> >>
> >>
> >>
> >> --
> >> Harsh J
> >>
> >
>