Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> How does mapreduce job determine the compress codec


+
Jiayu Ji 2013-12-14, 04:37
+
Tao Xiao 2013-12-14, 05:16
Copy link to this message
-
Re: How does mapreduce job determine the compress codec
Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.
On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <[EMAIL PROTECTED]> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <[EMAIL PROTECTED]>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <[EMAIL PROTECTED]>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>
+
Jiayu Ji 2013-12-16, 15:29