|
|
-
Map Failure reading .gz (gzip) files
Terry Healy 2013-01-14, 21:25
I'm trying to run a Map-only job using .gz input format. For testing, I have one compressed log file in the input directory. If the file is un-zipped, the code works fine.
Watching the jobs with .gz input via the job tracker shows that the mapper apparently has read the correct number of records (880,000), and it reports 195,357 map output records just as it does if the input file is un-zipped. But it then hangs until I finally kill the job.
And ideas what I'm missing?
Thanks,
Terry
-
Re: Map Failure reading .gz (gzip) files
bejoy.hadoop@... 2013-01-15, 03:02
Hi Terry
When the file is unzipped and zipped, what is the number of map tasks running in each case?
If the file is large, I assume the below should be the case.
gz is not splttable compression codec so the whole file would be processed by a single mapper. And this might be causing the job to hang as 1 task is not able to gracefully handle the logic on such larger data.
When it is unzipped/uncompressed there would be multiple map tasks and each is handling the respective data volume and processing logic gracefully.
------Original Message------ From: Terry Healy To: [EMAIL PROTECTED] ReplyTo: [EMAIL PROTECTED] Subject: Map Failure reading .gz (gzip) files Sent: Jan 15, 2013 02:55 I'm trying to run a Map-only job using .gz input format. For testing, I have one compressed log file in the input directory. If the file is un-zipped, the code works fine.
Watching the jobs with .gz input via the job tracker shows that the mapper apparently has read the correct number of records (880,000), and it reports 195,357 map output records just as it does if the input file is un-zipped. But it then hangs until I finally kill the job.
And ideas what I'm missing?
Thanks,
Terry
Regards Bejoy KS
Sent from remote device, Please excuse typos
|
|
All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by
Sematext