Joey Echeverria 2011-09-14, 17:34
Snappy is built into CDH3u1, so if you go that route it's easiest. As
for raw speed, snappy is the fastest, but doesn't have the best
compression ratio. There was an earlier thread where some people noted
faster HBase performance using gzip compared to snappy. I haven't done
my own testing, so it might be worth trying out those two and
On Wed, Sep 14, 2011 at 11:10 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> We use lzo, as do some of our customers. We have not tried Snappy yet, but all feedback I ever saw about it was positive.
> Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
> Hadoop ecosystem search :: http://search-hadoop.com/
>>From: Wayne <[EMAIL PROTECTED]>
>>To: [EMAIL PROTECTED]
>>Sent: Wednesday, September 14, 2011 8:33 AM
>>I wanted to do a poll on what compression libraries people are using and
>>why. We currently use lzo but are considering other alternatives for various
>>reasons. We would like to move to CDH3 but adding lzo ourselves is a hassle
>>we are not looking to take on. It kind of defeats the purpose os using CDH3
>>to begin with. We current run 20.0 append.
>>I know there are a lot of variables that affect the best decision, but we
>>are looking for general trends in the community.
>>Is lzo still the most recommended? Is there benefit in using the lzo
>>professional library and does anyone use this?
>>Is snappy just as good as lzo and a lot easier to deal with in term of node
>>Does zlib/gzip have any traction?
>>Compression ratios are important but as always performance/speed is our
>>biggest requirement. What are people using and why? Where is the momentum
>>going? Compression is a huge benefit of hadoop/hbase and having high
>>compression ratios with solid performance is a major benefit.
>>Any recommendations would be appreciated.