Todd Lipcon 2012-01-30, 19:21
By default we roll every hour regardless of the number of edits. We
also roll every 0.95 x blocksize (fairly arbitrary, but works out to
around 120M usually). And same thing - we trigger flushes when we have
too many logs - but we're much less agressive than you -- our default
here is 32 segments I think. We'll also roll immediately any time
there are less than 3 DNs in the pipeline.
On Mon, Jan 30, 2012 at 11:15 AM, Eric Newton <[EMAIL PROTECTED]> wrote:
> What is "periodically" for closing the log segments? Accumulo basis this
> decision based on the log size, with 1G segments by default.
> Accumulo marks the tablets with the log segment, and initiates flushes when
> the number of segments grows large (3 segments by default). Otherwise,
> tablets with low write-rates, like the metadata table can take forever to
> Does HBase do similar things? Have you experimented with larger/smaller
> On Mon, Jan 30, 2012 at 1:59 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>> On Mon, Jan 30, 2012 at 10:53 AM, Eric Newton <[EMAIL PROTECTED]>
>> > I will definitely look at using HBase's WAL. What did you guys do to
>> > distribute the log-sort/split?
>> In 0.90 the log sorting was done serially by the master, which as you
>> can imagine was slow after a full outage.
>> In 0.92 we have distributed log splitting:
>> I don't think there is a single design doc for it anywhere, but
>> basically we use ZK as a distributed work queue. The master shoves
>> items in there, and the region servers try to claim them. Each item
>> corresponds to a log segment (the HBase WAL rolls periodically). After
>> the segments are split, they're moved into the appropriate region
>> directories, so when the region is next opened, they are replayed.
>> > On Mon, Jan 30, 2012 at 1:37 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>> >> On Mon, Jan 30, 2012 at 10:05 AM, Aaron Cordova
>> >> <[EMAIL PROTECTED]> wrote:
>> >> >> The big problem is in the fact that writing replicas in HDFS is done
>> >> a pipeline, rather than in parallel. There is a ticket to change this
>> >> (HDFS-1783), but no movement on it since last summer.
>> >> >
>> >> > ugh - why would they change this? Pipelining maximizes bandwidth
>> >> It'd be cool if the log stream could be configured to return after
>> >> to one, two, or more nodes though.
>> >> >
>> >> The JIRA proposes to allow "star replication" instead of "pipeline
>> >> replication" on a per-stream basis. Pipelining trades off latency for
>> >> bandwidth -- multiple RTTs instead of 1 RTT.
>> >> A few other notes relevant to the discussion above (sorry for losing
>> >> the quote history):
>> >> Regarding HDFS's being designed for large sequential writes rather
>> >> than small records, that was originally true, but now its actually
>> >> fairly efficient. We have optimizations like HDFS-895 specifically for
>> >> the WAL use case which approximate things like group commit, and when
>> >> you combine that with group commit at the tablet-server level you can
>> >> get very good throughput along with durability guarantees. I haven't
>> >> benchmarked vs Accumulo's Loggers ever, but I'd be surprised if the
>> >> difference were substantial - we tend to be network bound on the WAL
>> >> unless the edits are really quite tiny.
>> >> We're also looking at making our WAL implementation pluggable: see
>> >> HBASE-4529. Maybe a similar approach could be taken in Accumulo such
>> >> that HBase could use Accumulo loggers, or Accumulo could use HBase's
>> >> existing WAL class?
>> >> -Todd
>> >> --
>> >> Todd Lipcon
>> >> Software Engineer, Cloudera
>> Todd Lipcon
>> Software Engineer, Cloudera
Software Engineer, Cloudera