Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan performance on compressed column families


Copy link to this message
-
Re: Scan performance on compressed column families
Hello Oliver,

Thank you for the clarification. As Kevin also pointed out, I guess we will
just have to test compression in our environment.

Regards,

/David
On Fri, Nov 9, 2012 at 8:46 PM, Oliver Meyn (GBIF) <[EMAIL PROTECTED]> wrote:

> Hi David,
>
> I wrote that blog post and I know that Lars George has much more
> experience than me with tuning HBase, especially in different environments,
> so weight our opinions accordingly.  As he says, it will "usually" help,
> and the unusual cases of lower spec'd hardware (that I did those tests on)
> are where it might hurt scans, but obviously still helps with disk and
> network use.  So take my post with a grain of salt, and as Kevin says, try
> it out on your data and your cluster and see what works best for you.
>
> Cheers,
> Oliver
>
> On 2012-11-03, at 3:57 PM, David Koch wrote:
>
> > Hello,
> >
> > Are scans faster when compression is activated? The HBase book by Lars
> > George seems to suggest so (p424, Section on "Compression" in chapter
> > "Performance Tuning").
> >
> > "... compression usually will yield overall better performance, because
> the
> > overhead of the CPU performing the compression and de- compression is
> less
> > than what is required to read more data from disk."
> >
> > I searched around for a bit and found this:
> > http://gbif.blogspot.fr/2012/02/performance-evaluation-of-hbase.html.
> The
> > author conducted a series of scan performance tests on tables of up to
> > 200million rows and found that compression actually slowed down read
> > performance slightly - albeit at lower CPU load.
> >
> > Thank you,
> >
> > /David
>
>
> --
> Oliver Meyn
> Software Developer
> Global Biodiversity Information Facility (GBIF)
> +45 35 32 15 12
> http://www.gbif.org
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB