Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase: "small" WAL transactions Q


Copy link to this message
-
Re: HBase: "small" WAL transactions Q
That person should have been Lars, I think.

On Tue, Oct 2, 2012 at 7:04 PM, Alex Baranau <[EMAIL PROTECTED]>wrote:

> > Currently HRegion.mutateRowsWithLocks actually acquires
> > locks on all rows first (since the contract here is a transaction),
> > so (currently) you would get unnecessarily reduced concurrency
> > using that API for changes that do not need to be atomic.
>
> Right, it's about "unnecessarily reduced concurrency" vs "faster writing
> edits to WAL". In case the changes you write do not intersect (do not
> belong to the same row), which I imagine is the most common case when using
> HBase, then it makes sense to choose faster writing to WAL.
>
> > Also note that a Put(List<Put>) operation already writes multiple
> > updates to a single WALEdit (doing a best effort batching).
>
> Do you mean HTable.put(List<Put>) operation? Really? Hm.. Oh, you probably
> mean that updates *that belong to the same row* are getting written to WAL
> as single WALEdit. Yeah, that was a great improvement (esp. w.r.t. to
> consistency).
>
> If there are no objections, I'd add this idea of "faster writing edits to
> WAL" by putting more updates of multiple rows into single WALEdit (which
> essentially is WAL write transaction) into JIRA.
>
> Would be great to hear J-D's thoughts: if I remember correctly, he
> mentioned that he tried to do FS sync() on each write to WAL (to ensure
> "real durability"). Again, if I remember correctly this brought quite a lot
> of overhead... which can be reduced by bigger writes to WAL. Or may be it
> wasn't J-D who talked about it on the hackathon after HBaseCon?
>
> Alex
>
> On Tue, Oct 2, 2012 at 8:20 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
>
> > This is an interesting observation. I have not thought about HBASE-5229
> in
> > terms of a performance improvement.
> > Currently HRegion.mutateRowsWithLocks actually acquires locks on all rows
> > first (since the contract here is a transaction), so (currently) you
> would
> > get unnecessarily reduced concurrency using that API for changes that do
> > not need to be atomic.
> >
> >
> > Also note that a Put(List<Put>) operation already writes multiple updates
> > to a single WALEdit (doing a best effort batching).
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Alex Baranau <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED]
> > Sent: Tuesday, October 2, 2012 4:29 PM
> > Subject: HBase: "small" WAL transactions Q
> >
> > Hello,
> >
> > May be silly question.
> >
> > Data in WAL is written in small transactions. One transaction is a set of
> > KeyValues for specific (single) row. As we want each written transaction
> to
> > be durable we write them into the WAL one-by-one (ideally with FS sync()
> > calls, etc. on each write). Which is very costly (doing that for each
> > write).
> >
> > Having bigger WAL transactions (writing changes to several "close"
> records)
> > should be more efficient (would result in increase of write throughput).
> > I.e. WALEdit record would contain updates to the multiple different rows.
> > As far as I understand smth like that was implemented in HBASE-5229 [1].
> > But it is not a default behavior when sending multiple records changes to
> > RS (e.g. when flushing client-side buffer). It also cannot be forced.
> What
> > are the major reasons for not using that? Is locking multiple "close"
> rows
> > looks so dangerous? Or is it simply not efficient (there's more to that
> > besides what I described above)?
> >
> > Thank you,
> > Alex Baranau
> > ------
> > Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
> >
> > [1] https://issues.apache.org/jira/browse/HBASE-5229
> >
>