Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: M/R, Strange behavior with multiple Gzip files


+
x6i4uybz labs 2012-12-06, 09:57
+
Jean-Marc Spaggiari 2012-12-06, 12:34
+
x6i4uybz labs 2012-12-06, 14:40
Copy link to this message
-
Re: M/R, Strange behavior with multiple Gzip files
Harsh J 2012-12-06, 14:48
I tend to agree with Jean-Marc's observation. If your job client logs
a "LocalJobRunner" at any point, then that is most definitely your
problem.

Otherwise, if you feel you are facing a scheduling problem, then it
may most likely be your scheduler configuration. For example,
FairScheduler has a <maxMaps/> attribute over its pools that you can
set to control maximum parallel use of slots for jobs using that pool,
etc..

On Thu, Dec 6, 2012 at 8:10 PM, x6i4uybz labs <[EMAIL PROTECTED]> wrote:
> Hello,
>
> The job isn't running in local mode. In fact, I think I have just a problem
> with the map task progression.
> The counters of each map task are OK during the job execution whereas the
> progression of each map task stays at 0%.
>
>
>
> On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>>
>> Have you configured the mapredsite.xml to tell where the job tracker
>> is? If not, your job is running on the local jobtracker, running the
>> tasks one by one.
>>
>> JM
>>
>> PS: I faced the same issue few weeks ago and got the exact same
>> behaviour. This (above) solved the issue.
>>
>> 2012/12/6, x6i4uybz labs <[EMAIL PROTECTED]>:
>> > Sorry,
>> >
>> > I wrote a job M/R to process several gz files (about 2000). I've a 80
>> > map
>> > slots cluster
>> > JT instantiates one map per gz file (not splittable, it's OK).
>> >
>> > The first 80 maps spawn. But after "initializing" state,  it seems there
>> > is
>> > one map running. And when this map is finished, another one started (not
>> > 80
>> > maps in parallel) and another is affected to the empty slot.
>> >
>> > I've also noticed, the first maps last more than one hour and the last
>> > maps
>> > 50 seconds.
>> > Each gz file is between 10mb and 100mb.
>> >
>> > I don't understand the behavior.
>> > I will launch again the job to see if I've the same issue.
>> >
>> > thanks, gpo
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> >
>> >> Your problem isn't clear in your description - can you please
>> >> rephrase/redefine in terms of what you are expecting vs. what you are
>> >> observing.
>> >>
>> >> Also note that Gzip files are not splittable by nature of their codec
>> >> algorithm, and hence a TextInputFormat over plain/regular Gzip files
>> >> would end up spawning and/or processing one whole Gzip file via one
>> >> mapper, instead of multiple mappers per file.
>> >>
>> >> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs
>> >> <[EMAIL PROTECTED]>
>> >> wrote:
>> >> > Hi everybody,
>> >> >
>> >> > I have a M/R job which does a bulk import to hbase.
>> >> > I have to process many gzip files (2800 x ~ 100mb)
>> >> >
>> >> > I don't understand why my job instanciates 80 maps but runs each map
>> >> > sequentialy like if there is only one big gz file.
>> >> >
>> >> > Is there a problem in my driver ? Or maybe I miss something.
>> >> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
>> >> args[0]
>> >> > is a directory.
>> >> >
>> >> > Can you help me, please ?
>> >> >
>> >> > Thanks, Guillaume
>> >>
>> >>
>> >>
>> >> --
>> >> Harsh J
>> >>
>> >
>
>

--
Harsh J
+
x6i4uybz labs 2012-12-06, 16:53