Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> SequenceFile compression on Amazon EMR not very good


Copy link to this message
-
Re: SequenceFile compression on Amazon EMR not very good
Hi Zheng,

I cross checked. I am setting the following in my Hive script before the
INSERT command:

SET io.seqfile.compression.type=BLOCK;
SET hive.exec.compress.output=true;

A 132 MB (gzipped) input file going through a cleanup and getting populated
in a sequencefile table is growing to 432 MB. What could be going wrong?

Saurabh.

On Wed, Feb 3, 2010 at 2:26 PM, Saurabh Nanda <[EMAIL PROTECTED]>wrote:

> Thanks, Zheng. Will do some more tests and get back.
>
> Saurabh.
>
>
> On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <[EMAIL PROTECTED]> wrote:
>
>> I would first check whether it is really the block compression or
>> record compression.
>> Also maybe the block size is too small but I am not sure that is
>> tunable in SequenceFile or not.
>>
>> Zheng
>>
>> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <[EMAIL PROTECTED]>
>> wrote:
>> > Hi,
>> >
>> > The size of my Gzipped weblog files is about 35MB. However, upon
>> enabling
>> > block compression, and inserting the logs into another Hive table
>> > (sequencefile), the file size bloats up to about 233MB. I've done
>> similar
>> > processing on a local Hadoop/Hive cluster, and while the compressions is
>> not
>> > as good as gzipping, it still is not this bad. What could be going
>> wrong?
>> >
>> > I looked at the header of the resulting file and here's what it says:
>> >
>> >
>> SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec
>> >
>> > Does Amazon Elastic MapReduce behave differently or am I doing something
>> > wrong?
>> >
>> > Saurabh.
>> > --
>> > http://nandz.blogspot.com
>> > http://foodieforlife.blogspot.com
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>
>
> --
> http://nandz.blogspot.com
> http://foodieforlife.blogspot.com
>

--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB