Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # user - How does mapreduce job determine the compress codec


Copy link to this message
-
Re: How does mapreduce job determine the compress codec
Azuryy Yu 2013-12-16, 01:53
Hi Jiayu,
For the Sequence file as an input, CompressCodec class was serialized in
the file header, then Sequence Filereader will know the compression algo.
thanks.
On Mon, Dec 16, 2013 at 8:28 AM, Jiayu Ji <[EMAIL PROTECTED]> wrote:

> Thanks Tao. I know I can tell it is a lzo file based on the magic number.
> What I am curious is which class in hadoop used by the mapreduce job to
> determine the file compression algorithm. At the end of the day, I am
> trying to figure out whether all the inputs of a mapreduce job have to be
> compressed with the same algorithm.
>
>
> On Fri, Dec 13, 2013 at 11:16 PM, Tao Xiao <[EMAIL PROTECTED]>wrote:
>
>> I suggest you download the lzo compressed file, no matter weather it has
>> a lzo extension as its file name,  and open it in the form of hex bytes
>> with tools like UltraEdit, and have a look at its heading contents.
>>
>>
>> 2013/12/14 Jiayu Ji <[EMAIL PROTECTED]>
>>
>>> Hi
>>>
>>> I am having this question on how does mapreduce job determine the
>>> compress codec on hdfs. From what I read on the definitive guide (page
>>> 86)," the CompressionCodecFactory provides a way of mapping a filename
>>> extension to a CompressionCodec using its getCodec() method". I did a test
>>> with a lzo compressed file without a lzo extension. However, the mapreduce
>>> job was still able to get the right codec. Does anyone know why? Thanks in
>>> advance.
>>>
>>> Jiayu
>>>
>>
>>
>
>
> --
> Jiayu (James) Ji,
>
> Cell: (312)823-7393
>
>