-Re: SequenceFile compression on Amazon EMR not very good
Saurabh Nanda 2010-02-18, 16:25
I cross checked. I am setting the following in my Hive script before the
A 132 MB (gzipped) input file going through a cleanup and getting populated
in a sequencefile table is growing to 432 MB. What could be going wrong?
On Wed, Feb 3, 2010 at 2:26 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote:
> Thanks, Zheng. Will do some more tests and get back.
> On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <[EMAIL PROTECTED]> wrote:
>> I would first check whether it is really the block compression or
>> record compression.
>> Also maybe the block size is too small but I am not sure that is
>> tunable in SequenceFile or not.
>> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <[EMAIL PROTECTED]>
>> > Hi,
>> > The size of my Gzipped weblog files is about 35MB. However, upon
>> > block compression, and inserting the logs into another Hive table
>> > (sequencefile), the file size bloats up to about 233MB. I've done
>> > processing on a local Hadoop/Hive cluster, and while the compressions is
>> > as good as gzipping, it still is not this bad. What could be going
>> > I looked at the header of the resulting file and here's what it says:
>> > Does Amazon Elastic MapReduce behave differently or am I doing something
>> > wrong?
>> > Saurabh.
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com