Nothing special is required for process .gz files using MR. however , as
Sanjay mentioned , verify the codec's configured in core-site and another
thing to note is that these files are not splittable.
You might want to use bz2 , these are splittable.
On Wed, Jun 12, 2013 at 10:14 AM, Sanjay Subramanian <
[EMAIL PROTECTED]> wrote:
> No special settings required for reading Gzip except these above
> I u want to output Gzip
> hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true");
> Make sure Gzip codec is defined in core-site.xml
> <!-- core-site.xml -->
> I have a question
> Why are u using GZIP as input to Map ? These are not splittable…Unless u
> have to read multilines (like lines between a BEGIN and END block in a log
> file) and send it as one record to the mapper
> Also in Non-splitable Snappy Codec is better
> Good Luck
> From: samir das mohapatra <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Tuesday, June 11, 2013 9:07 PM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Now give .gz file as input to the MAP
> Hi All,
> Did any one worked on, how to pass the .gz file as file input for
> mapreduce job ?
> CONFIDENTIALITY NOTICE
> =====================> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.