Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> M/R, Strange behavior with multiple Gzip files


+
x6i4uybz labs 2012-12-05, 16:02
Copy link to this message
-
Re: M/R, Strange behavior with multiple Gzip files
Your problem isn't clear in your description - can you please
rephrase/redefine in terms of what you are expecting vs. what you are
observing.

Also note that Gzip files are not splittable by nature of their codec
algorithm, and hence a TextInputFormat over plain/regular Gzip files
would end up spawning and/or processing one whole Gzip file via one
mapper, instead of multiple mappers per file.

On Wed, Dec 5, 2012 at 9:32 PM, x6i4uybz labs <[EMAIL PROTECTED]> wrote:
> Hi everybody,
>
> I have a M/R job which does a bulk import to hbase.
> I have to process many gzip files (2800 x ~ 100mb)
>
> I don't understand why my job instanciates 80 maps but runs each map
> sequentialy like if there is only one big gz file.
>
> Is there a problem in my driver ? Or maybe I miss something.
> I use "FileInputFormat.addInputPath(job, new Path(args[0]))" where args[0]
> is a directory.
>
> Can you help me, please ?
>
> Thanks, Guillaume

--
Harsh J
+
x6i4uybz labs 2012-12-06, 16:25
+
Harsh J 2012-12-06, 16:39
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB