Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - loggers


Copy link to this message
-
Re: loggers
Todd Lipcon 2012-01-30, 18:37
On Mon, Jan 30, 2012 at 10:05 AM, Aaron Cordova
<[EMAIL PROTECTED]> wrote:
>> The big problem is in the fact that writing replicas in HDFS is done in a pipeline, rather than in parallel. There is a ticket to change this (HDFS-1783), but no movement on it since last summer.
>
> ugh - why would they change this? Pipelining maximizes bandwidth usage. It'd be cool if the log stream could be configured to return after written to one, two, or more nodes though.
>

The JIRA proposes to allow "star replication" instead of "pipeline
replication" on a per-stream basis. Pipelining trades off latency for
bandwidth -- multiple RTTs instead of 1 RTT.

A few other notes relevant to the discussion above (sorry for losing
the quote history):

Regarding HDFS's being designed for large sequential writes rather
than small records, that was originally true, but now its actually
fairly efficient. We have optimizations like HDFS-895 specifically for
the WAL use case which approximate things like group commit, and when
you combine that with group commit at the tablet-server level you can
get very good throughput along with durability guarantees. I haven't
benchmarked vs Accumulo's Loggers ever, but I'd be surprised if the
difference were substantial - we tend to be network bound on the WAL
unless the edits are really quite tiny.

We're also looking at making our WAL implementation pluggable: see
HBASE-4529. Maybe a similar approach could be taken in Accumulo such
that HBase could use Accumulo loggers, or Accumulo could use HBase's
existing WAL class?

-Todd
--
Todd Lipcon
Software Engineer, Cloudera