Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> M/R, Strange behavior with multiple Gzip files

x6i4uybz labs 2012-12-05, 16:02
Copy link to this message
Re: M/R, Strange behavior with multiple Gzip files
Your problem isn't clear in your description - can you please
rephrase/redefine in terms of what you are expecting vs. what you are

Also note that Gzip files are not splittable by nature of their codec
algorithm, and hence a TextInputFormat over plain/regular Gzip files
would end up spawning and/or processing one whole Gzip file via one
mapper, instead of multiple mappers per file.

On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs <[EMAIL PROTECTED]> wrote:
> Hi everybody,
> I have a M/R job which does a bulk import to hbase.
> I have to process many gzip files (2800 x ~ 100mb)
> I don't understand why my job instanciates 80 maps but runs each map
> sequentialy like if there is only one big gz file.
> Is there a problem in my driver ? Or maybe I miss something.
> I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where args[0]
> is a directory.
> Can you help me, please ?
> Thanks, Guillaume

Harsh J
x6i4uybz labs 2012-12-06, 16:25
Harsh J 2012-12-06, 16:39