Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS >> mail # user >> Re: M/R, Strange behavior with multiple Gzip files


+
x6i4uybz labs 2012-12-06, 09:57
+
Jean-Marc Spaggiari 2012-12-06, 12:34
+
x6i4uybz labs 2012-12-06, 14:40
+
Harsh J 2012-12-06, 14:48
Copy link to this message
-
Re: M/R, Strange behavior with multiple Gzip files
If it's common to see 0%-100% jumps, my job runs normally.
It's OK for me. Thanks for your answers

On Thu, Dec 6, 2012 at 5:39 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Ok, I can't tell about the performance of your map process, but it is
> sometimes common to see 0% -> 100% jumps in progressbars when working
> over compressed data - as the progress (in terms of data records
> processed overall) can't be perfectly determined. It might even be a
> bug recently fixed.
>
> If your counters are updating fast enough over the minute, then I'd
> assume all is well. The local job runner concerns come from the
> statements of yours that only one map seems to be running at one time,
> but perhaps thats not the case anymore?
>
> On Thu, Dec 6, 2012 at 9:55 PM, x6i4uybz labs <[EMAIL PROTECTED]>
> wrote:
> > Thanks for your answers.
> >
> > I haven't yet the whole solution but I know :
> >   - the job is not running on a local TT
> >   - the map process is very slow
> >   - and the progress bar is not working proprely
> >
> > So, the map tasks are running in parallel (hadoop works :)) but I don't
> > understand why the progression of each map task stays at 0.
> >
> >
> >
> >
> >
> >
> > On Thu, Dec 6, 2012 at 3:48 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> >>
> >> I tend to agree with Jean-Marc's observation. If your job client logs
> >> a "LocalJobRunner" at any point, then that is most definitely your
> >> problem.
> >>
> >> Otherwise, if you feel you are facing a scheduling problem, then it
> >> may most likely be your scheduler configuration. For example,
> >> FairScheduler has a <maxMaps/> attribute over its pools that you can
> >> set to control maximum parallel use of slots for jobs using that pool,
> >> etc..
> >>
> >> On Thu, Dec 6, 2012 at 8:10 PM, x6i4uybz labs <[EMAIL PROTECTED]
> >
> >> wrote:
> >> > Hello,
> >> >
> >> > The job isn't running in local mode. In fact, I think I have just a
> >> > problem
> >> > with the map task progression.
> >> > The counters of each map task are OK during the job execution whereas
> >> > the
> >> > progression of each map task stays at 0%.
> >> >
> >> >
> >> >
> >> > On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari
> >> > <[EMAIL PROTECTED]> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Have you configured the mapredsite.xml to tell where the job tracker
> >> >> is? If not, your job is running on the local jobtracker, running the
> >> >> tasks one by one.
> >> >>
> >> >> JM
> >> >>
> >> >> PS: I faced the same issue few weeks ago and got the exact same
> >> >> behaviour. This (above) solved the issue.
> >> >>
> >> >> 2012/12/6, x6i4uybz labs <[EMAIL PROTECTED]>:
> >> >> > Sorry,
> >> >> >
> >> >> > I wrote a job M/R to process several gz files (about 2000). I've a
> 80
> >> >> > map
> >> >> > slots cluster
> >> >> > JT instantiates one map per gz file (not splittable, it's OK).
> >> >> >
> >> >> > The first 80 maps spawn. But after "initializing" state,  it seems
> >> >> > there
> >> >> > is
> >> >> > one map running. And when this map is finished, another one started
> >> >> > (not
> >> >> > 80
> >> >> > maps in parallel) and another is affected to the empty slot.
> >> >> >
> >> >> > I've also noticed, the first maps last more than one hour and the
> >> >> > last
> >> >> > maps
> >> >> > 50 seconds.
> >> >> > Each gz file is between 10mb and 100mb.
> >> >> >
> >> >> > I don't understand the behavior.
> >> >> > I will launch again the job to see if I've the same issue.
> >> >> >
> >> >> > thanks, gpo
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <[EMAIL PROTECTED]>
> wrote:
> >> >> >
> >> >> >> Your problem isn't clear in your description - can you please
> >> >> >> rephrase/redefine in terms of what you are expecting vs. what you
> >> >> >> are
> >> >> >> observing.
> >> >> >>
> >> >> >> Also note that Gzip files are not splittable by nature of their
> >> >> >> codec
> >> >> >> algorithm, and hence a TextInputFormat over plain/regular Gzip