Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HDFS, mail # user - Re: Now give .gz file as input to the MAP


Copy link to this message
-
Re: Now give .gz file as input to the MAP
Rahul Bhattacharjee 2013-06-12, 04:53
Nothing special is required for process .gz files using MR. however , as
Sanjay mentioned , verify the codec's configured in core-site and another
thing to note is that these files are not splittable.

You might want to use bz2 , these are splittable.

Thanks,
Rahul
On Wed, Jun 12, 2013 at 10:14 AM, Sanjay Subramanian <
[EMAIL PROTECTED]> wrote:

>  hadoopConf.set("mapreduce.job.inputformat.class",
> "com.wizecommerce.utils.mapred.TextInputFormat");
>
> hadoopConf.set("mapreduce.job.outputformat.class",
> "com.wizecommerce.utils.mapred.TextOutputFormat");
>  No special settings required for reading Gzip except these above
>
>  I u want to output Gzip
>
>  hadoopConf.set("mapreduce.output.fileoutputformat.compress", "true");
>
> hadoopConf.set("mapreduce.output.fileoutputformat.compress.codec",
> "org.apache.hadoop.io.compress.GzipCodec");
>
> Make sure Gzip codec is defined in core-site.xml
>  <!-- core-site.xml -->
>  <property>
>      <name>io.compression.codecs</name>
>      <value
> >org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec</
> value>
>  </property>
>
>  I have a question
>
>  Why are u using GZIP as input to Map ? These are not splittable…Unless u
> have to read multilines (like lines between a BEGIN and END block in a log
> file) and send it as one record to the mapper
>
>  Also in Non-splitable Snappy Codec is better
>
>  Good Luck
>
>
>  sanjay
>
>   From: samir das mohapatra <[EMAIL PROTECTED]>
> Reply-To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Date: Tuesday, June 11, 2013 9:07 PM
> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>, "
> [EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> Subject: Now give .gz file as input to the MAP
>
>   Hi All,
>     Did any one worked on, how to pass the .gz file as  file input for
> mapreduce job ?
>
> Regards,
> samir.
>
> CONFIDENTIALITY NOTICE
> =====================> This email message and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
+
Sanjay Subramanian 2013-06-12, 17:43
+
Rahul Bhattacharjee 2013-06-12, 17:47
+
samir das mohapatra 2013-06-12, 04:07