Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - [paper]Solving Big Data Challenges for Enterprise Application Performance Management


Copy link to this message
-
Re: [paper]Solving Big Data Challenges for Enterprise Application Performance Management
Jean-Daniel Cryans 2012-08-30, 16:18
On Thu, Aug 30, 2012 at 8:57 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> Did I assist? No. Maybe you mean attend? Likewise, no.

*grin* yeah that was a french-ism, I meant attend.

>
> This certainly wasn't advertised per the HN headline, and must have been in
> some other track than what I attended, because this is the first I've heard
> of it. The conference is just over now otherwise I'd track down the authors.

If you look at the title you wouldn't think that it's really just a
poor benchmark of a mismatch of databases. When I think "BigData",
"in-memory databases" doesn't usually come to mind.

I went through the paper only once, but I have a couple of questions
for them if you can track them down:

 - Was dfs.replication really set correctly? In 5.7 they say they set
"no replication" but I'm wondering if they did that correctly in
hbase-site.xml (or put the HDFS configuration on the HBase classpath).
 - Was append really enabled? They publish that our write latency is
0.1ms and we're actually below Redis!
 - Did they do any checks regarding regions distribution? As far as I
can tell they don't even know what that is.
 - They did even try to understand why HBase was failing often in
their tests? Doesn't that suppose an underlying problem which may
affect the metrics they are collecting?

Not related to HBase but still wondering:

 - In 5.7, what exactly were they smoking when they said that using
compression will decrease throughput?
 - Which consistency level was used for Cassandra?

>
> I attended the Distributed Databases session yesterday. This paper
> presented a multi-datacenter transactional system with concurrency built on
> top of HBase:  Serializability, not Serial: Concurrency Control and
> Availability in Multi-Datacenter
> Datastores<http://vldb.org/pvldb/vol5/p1459_stacypatterson_vldb2012.pdf>(Stacy
> Patterson, Aaron J. Elmore, Faisal Nawab, Divyakant Agrawal, Amr El Abbadi).
> So someone out there in academia is using HBase successfully. And the
> presentation was fantastic too, by the way. I also finally was able to
> attend a talk on PBS in person -- Probabilistically Bounded Staleness for
> Practical Partial
> Quorums<http://vldb.org/pvldb/vol5/p776_peterbailis_vldb2012.pdf>(Peter
> Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M.
> Hellerstein, Ion Stoica) -- which is really cool work but made me glad our
> users are working with HBase.

Awesome!

J-D