Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive, mail # user - SequenceFile compression on Amazon EMR not very good


Copy link to this message
-
Re: SequenceFile compression on Amazon EMR not very good
Saurabh Nanda 2010-02-18, 16:25
Hi Zheng,

I cross checked. I am setting the following in my Hive script before the
INSERT command:

SET io.seqfile.compression.type=BLOCK;
SET hive.exec.compress.output=true;

A 132 MB (gzipped) input file going through a cleanup and getting populated
in a sequencefile table is growing to 432 MB. What could be going wrong?

Saurabh.

On Wed, Feb 3, 2010 at 2:26 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote:

> Thanks, Zheng. Will do some more tests and get back.
>
> Saurabh.
>
>
> On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <[EMAIL PROTECTED]> wrote:
>
>> I would first check whether it is really the block compression or
>> record compression.
>> Also maybe the block size is too small but I am not sure that is
>> tunable in SequenceFile or not.
>>
>> Zheng
>>
>> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> > The size of my Gzipped weblog files is about 35MB. However, upon
>> enabling
>> > block compression, and inserting the logs into another Hive table
>> > (sequencefile), the file size bloats up to about 233MB. I've done
>> similar
>> > processing on a local Hadoop/Hive cluster, and while the compressions is
>> not
>> > as good as gzipping, it still is not this bad. What could be going
>> wrong?
>> >
>> > I looked at the header of the resulting file and here's what it says:
>> >
>> >
>> SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec
>> >
>> > Does Amazon Elastic MapReduce behave differently or am I doing something
>> > wrong?
>> >
>> > Saurabh.
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>

--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com