-Re: HBase: "small" WAL transactions Q
lars hofhansl 2012-10-03, 03:22
Heh, yes. See HDFS-744 and HBASE-5954.
And re: doMiniBatchMutation in HRegion, it does write multiple Puts (even for different row keys) into a single WALEdit.
From: Ted Yu <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, October 2, 2012 7:11 PM
Subject: Re: HBase: "small" WAL transactions Q
That person should have been Lars, I think.
On Tue, Oct 2, 2012 at 7:04 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:
> > Currently HRegion.mutateRowsWithLocks actually acquires
> > locks on all rows first (since the contract here is a transaction),
> > so (currently) you would get unnecessarily reduced concurrency
> > using that API for changes that do not need to be atomic.
> Right, it's about "unnecessarily reduced concurrency" vs "faster writing
> edits to WAL". In case the changes you write do not intersect (do not
> belong to the same row), which I imagine is the most common case when using
> HBase, then it makes sense to choose faster writing to WAL.
> > Also note that a Put(List<Put>) operation already writes multiple
> > updates to a single WALEdit (doing a best effort batching).
> Do you mean HTable.put(List<Put>) operation? Really? Hm.. Oh, you probably
> mean that updates *that belong to the same row* are getting written to WAL
> as single WALEdit. Yeah, that was a great improvement (esp. w.r.t. to
> If there are no objections, I'd add this idea of "faster writing edits to
> WAL" by putting more updates of multiple rows into single WALEdit (which
> essentially is WAL write transaction) into JIRA.
> Would be great to hear J-D's thoughts: if I remember correctly, he
> mentioned that he tried to do FS sync() on each write to WAL (to ensure
> "real durability"). Again, if I remember correctly this brought quite a lot
> of overhead... which can be reduced by bigger writes to WAL. Or may be it
> wasn't J-D who talked about it on the hackathon after HBaseCon?
> On Tue, Oct 2, 2012 at 8:20 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> > This is an interesting observation. I have not thought about HBASE-5229
> > terms of a performance improvement.
> > Currently HRegion.mutateRowsWithLocks actually acquires locks on all rows
> > first (since the contract here is a transaction), so (currently) you
> > get unnecessarily reduced concurrency using that API for changes that do
> > not need to be atomic.
> > Also note that a Put(List<Put>) operation already writes multiple updates
> > to a single WALEdit (doing a best effort batching).
> > -- Lars
> > ________________________________
> > From: Alex Baranau <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, October 2, 2012 4:29 PM
> > Subject: HBase: "small" WAL transactions Q
> > Hello,
> > May be silly question.
> > Data in WAL is written in small transactions. One transaction is a set of
> > KeyValues for specific (single) row. As we want each written transaction
> > be durable we write them into the WAL one-by-one (ideally with FS sync()
> > calls, etc. on each write). Which is very costly (doing that for each
> > write).
> > Having bigger WAL transactions (writing changes to several "close"
> > should be more efficient (would result in increase of write throughput).
> > I.e. WALEdit record would contain updates to the multiple different rows.
> > As far as I understand smth like that was implemented in HBASE-5229 .
> > But it is not a default behavior when sending multiple records changes to
> > RS (e.g. when flushing client-side buffer). It also cannot be forced.
> > are the major reasons for not using that? Is locking multiple "close"
> > looks so dangerous? Or is it simply not efficient (there's more to that
> > besides what I described above)?
> > Thank you,
> > Alex Baranau
> > ------
> > Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch -