Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: M/R, Strange behavior with multiple Gzip files


+
x6i4uybz labs 2012-12-06, 09:57
Copy link to this message
-
Re: M/R, Strange behavior with multiple Gzip files
Jean-Marc Spaggiari 2012-12-06, 12:34
Hi,

Have you configured the mapredsite.xml to tell where the job tracker
is? If not, your job is running on the local jobtracker, running the
tasks one by one.

JM

PS: I faced the same issue few weeks ago and got the exact same
behaviour. This (above) solved the issue.

2012/12/6, x6i4uybz labs <[EMAIL PROTECTED]>:
> Sorry,
>
> I wrote a job M/R to process several gz files (about 2000). I've a 80 map
> slots cluster
> JT instantiates one map per gz file (not splittable, it's OK).
>
> The first 80 maps spawn. But after "initializing" state,  it seems there is
> one map running. And when this map is finished, another one started (not 80
> maps in parallel) and another is affected to the empty slot.
>
> I've also noticed, the first maps last more than one hour and the last maps
> 50 seconds.
> Each gz file is between 10mb and 100mb.
>
> I don't understand the behavior.
> I will launch again the job to see if I've the same issue.
>
> thanks, gpo
>
>
>
>
>
>
>
>
> On Wed, Dec 5, 2012 at 6:33 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Your problem isn't clear in your description - can you please
>> rephrase/redefine in terms of what you are expecting vs. what you are
>> observing.
>>
>> Also note that Gzip files are not splittable by nature of their codec
>> algorithm, and hence a TextInputFormat over plain/regular Gzip files
>> would end up spawning and/or processing one whole Gzip file via one
>> mapper, instead of multiple mappers per file.
>>
>> On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs <[EMAIL PROTECTED]>
>> wrote:
>> > Hi everybody,
>> >
>> > I have a M/R job which does a bulk import to hbase.
>> > I have to process many gzip files (2800 x ~ 100mb)
>> >
>> > I don't understand why my job instanciates 80 maps but runs each map
>> > sequentialy like if there is only one big gz file.
>> >
>> > Is there a problem in my driver ? Or maybe I miss something.
>> > I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where
>> args[0]
>> > is a directory.
>> >
>> > Can you help me, please ?
>> >
>> > Thanks, Guillaume
>>
>>
>>
>> --
>> Harsh J
>>
>
+
x6i4uybz labs 2012-12-06, 14:40
+
Harsh J 2012-12-06, 14:48
+
x6i4uybz labs 2012-12-06, 16:53