Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hive >> mail # user >> SequenceFile compression on Amazon EMR not very good


Copy link to this message
-
Re: SequenceFile compression on Amazon EMR not very good
Thanks, Zheng. Will do some more tests and get back.

Saurabh.

On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <[EMAIL PROTECTED]> wrote:

> I would first check whether it is really the block compression or
> record compression.
> Also maybe the block size is too small but I am not sure that is
> tunable in SequenceFile or not.
>
> Zheng
>
> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > The size of my Gzipped weblog files is about 35MB. However, upon enabling
> > block compression, and inserting the logs into another Hive table
> > (sequencefile), the file size bloats up to about 233MB. I've done similar
> > processing on a local Hadoop/Hive cluster, and while the compressions is
> not
> > as good as gzipping, it still is not this bad. What could be going wrong?
> >
> > I looked at the header of the resulting file and here's what it says:
> >
> >
> SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec
> >
> > Does Amazon Elastic MapReduce behave differently or am I doing something
> > wrong?
> >
> > Saurabh.
> > --
> > http://nandz.blogspot.com
> > http://foodieforlife.blogspot.com
> >
>
>
>
> --
> Yours,
> Zheng
>

--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB