Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Scan performance on compressed column families


+
David Koch 2012-11-03, 14:57
+
David Koch 2012-11-07, 13:09
Copy link to this message
-
Re: Scan performance on compressed column families
Dave,

  I would recommend trying it in your environment.  Like most tests you can
find other blogs that argue the performance is better:

http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of

  It will depend on your environment, filesize, how well it compresses, how
taxed your disks are, how wide your cells are, etc..

On Wed, Nov 7, 2012 at 8:09 AM, David Koch <[EMAIL PROTECTED]> wrote:

> *BUMP*
>
> Sorry,
>
> /David
>
> On Sat, Nov 3, 2012 at 3:57 PM, David Koch <[EMAIL PROTECTED]> wrote:
>
> > Hello,
> >
> > Are scans faster when compression is activated? The HBase book by Lars
> > George seems to suggest so (p424, Section on "Compression" in chapter
> > "Performance Tuning").
> >
> > "... compression usually will yield overall better performance, because
> > the overhead of the CPU performing the compression and de- compression is
> > less than what is required to read more data from disk."
> >
> > I searched around for a bit and found this:
> > http://gbif.blogspot.fr/2012/02/performance-evaluation-of-hbase.html.
> The
> > author conducted a series of scan performance tests on tables of up to
> > 200million rows and found that compression actually slowed down read
> > performance slightly - albeit at lower CPU load.
> >
> > Thank you,
> >
> > /David
> >
>

--
Kevin O'Dell
Customer Operations Engineer, Cloudera
+
Oliver Meyn 2012-11-09, 19:46
+
David Koch 2012-11-11, 18:08
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB