Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Scan performance on compressed column families


Copy link to this message
-
Re: Scan performance on compressed column families
Hello Oliver,

Thank you for the clarification. As Kevin also pointed out, I guess we will
just have to test compression in our environment.

Regards,

/David
On Fri, Nov 9, 2012 at 8:46 PM, Oliver Meyn (GBIF) <[EMAIL PROTECTED]> wrote:

> Hi David,
>
> I wrote that blog post and I know that Lars George has much more
> experience than me with tuning HBase, especially in different environments,
> so weight our opinions accordingly.  As he says, it will "usually" help,
> and the unusual cases of lower spec'd hardware (that I did those tests on)
> are where it might hurt scans, but obviously still helps with disk and
> network use.  So take my post with a grain of salt, and as Kevin says, try
> it out on your data and your cluster and see what works best for you.
>
> Cheers,
> Oliver
>
> On 2012-11-03, at 3:57 PM, David Koch wrote:
>
> > Hello,
> >
> > Are scans faster when compression is activated? The HBase book by Lars
> > George seems to suggest so (p424, Section on "Compression" in chapter
> > "Performance Tuning").
> >
> > "... compression usually will yield overall better performance, because
> the
> > overhead of the CPU performing the compression and de- compression is
> less
> > than what is required to read more data from disk."
> >
> > I searched around for a bit and found this:
> > http://gbif.blogspot.fr/2012/02/performance-evaluation-of-hbase.html.
> The
> > author conducted a series of scan performance tests on tables of up to
> > 200million rows and found that compression actually slowed down read
> > performance slightly - albeit at lower CPU load.
> >
> > Thank you,
> >
> > /David
>
>
> --
> Oliver Meyn
> Software Developer
> Global Biodiversity Information Facility (GBIF)
> +45 35 32 15 12
> http://www.gbif.org
>
>