Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> M/R, Strange behavior with multiple Gzip files


+
x6i4uybz labs 2012-12-05, 16:02
+
Harsh J 2012-12-05, 17:33
Copy link to this message
-
Re: M/R, Strange behavior with multiple Gzip files
Thanks for your answers.

I haven't yet the whole solution but I know :
  - the job is not running on a local TT
  - the map process is very slow
  - and the progress bar is not working proprely

So, the map tasks are running in parallel (hadoop works :)) but I don't
understand why the progression of each map task stays at 0.
On Thu, Dec 6, 2012 at 3:48 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> I tend to agree with Jean-Marc's observation. If your job client logs
> a "LocalJobRunner" at any point, then that is most definitely your
> problem.
>
> Otherwise, if you feel you are facing a scheduling problem, then it
> may most likely be your scheduler configuration. For example,
> FairScheduler has a <maxMaps/> attribute over its pools that you can
> set to control maximum parallel use of slots for jobs using that pool,
> etc..
>
> On Thu, Dec 6, 2012 at 8:10 PM, x6i4uybz labs <[EMAIL PROTECTED]>
> wrote:
> > Hello,
> >
> > The job isn't running in local mode. In fact, I think I have just a
> problem
> > with the map task progression.
> > The counters of each map task are OK during the job execution whereas the
> > progression of each map task stays at 0%.
> >
> >
> >
> > On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari
> > <[EMAIL PROTECTED]> wrote:
> >>
> >> Hi,
> >>
> >> Have you configured the mapredsite.xml to tell where the job tracker
> >> is? If not, your job is running on the local jobtracker, running the
> >> tasks one by one.
> >>
> >> JM
> >>
> >> PS: I faced the same issue few weeks ago and got the exact same
> >> behaviour. This (above) solved the issue.
> >>
> >> 2012/12/6, x6i4uybz labs <[EMAIL PROTECTED]>:
> >> > Sorry,
> >> >
> >> > I wrote a job M/R to process several gz files (about 2000). I've a 80
> >> > map
> >> > slots cluster
> >> > JT instantiates one map per gz file (not splittable, it's OK).
> >> >
> >> > The first 80 maps spawn. But after "initializing" state,  it seems
> there
> >> > is
> >> > one map running. And when this map is finished, another one started
> (not
> >> > 80
> >> > maps in parallel) and another is affected to the empty slot.
> >> >
> >> > I've also noticed, the first maps last more than one hour and the last
> >> > maps
> >> > 50 seconds.
> >> > Each gz file is between 10mb and 100mb.
> >> >
> >> > I don't understand the behavior.
> >> > I will launch again the job to see if I've the same issue.
> >> >
> >> > thanks, gpo
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >> >
> >> >> Your problem isn't clear in your description - can you please
> >> >> rephrase/redefine in terms of what you are expecting vs. what you are
> >> >> observing.
> >> >>
> >> >> Also note that Gzip files are not splittable by nature of their codec
> >> >> algorithm, and hence a TextInputFormat over plain/regular Gzip files
> >> >> would end up spawning and/or processing one whole Gzip file via one
> >> >> mapper, instead of multiple mappers per file.
> >> >>
> >> >> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs
> >> >> <[EMAIL PROTECTED]>
> >> >> wrote:
> >> >> > Hi everybody,
> >> >> >
> >> >> > I have a M/R job which does a bulk import to hbase.
> >> >> > I have to process many gzip files (2800 x ~ 100mb)
> >> >> >
> >> >> > I don't understand why my job instanciates 80 maps but runs each
> map
> >> >> > sequentialy like if there is only one big gz file.
> >> >> >
> >> >> > Is there a problem in my driver ? Or maybe I miss something.
> >> >> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
> >> >> args[0]
> >> >> > is a directory.
> >> >> >
> >> >> > Can you help me, please ?
> >> >> >
> >> >> > Thanks, Guillaume
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Harsh J
> >> >>
> >> >
> >
> >
>
>
>
> --
> Harsh J
>
+
Harsh J 2012-12-06, 16:39