Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> Re: [Shadow Regions / Read Replicas ] Wal per region?


Copy link to this message
-
Re: [Shadow Regions / Read Replicas ] Wal per region?
On Tue, Dec 3, 2013 at 3:07 PM, Enis Söztutar <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 3, 2013 at 2:03 PM, Jonathan Hsieh <[EMAIL PROTECTED]> wrote:
>
> > On Tue, Dec 3, 2013 at 11:42 AM, Enis Söztutar <[EMAIL PROTECTED]>
> wrote:
> >
> > > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > > Deveraj:
> > > > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality
> > (and
> > > > hence HDFS short
> > > > > circuit) of reads if you were to couple it with the favored nodes.
> > The
> > > > cost is of course more WAL
> > > > > files... In the current situation (no WALpr) it would create quite
> > some
> > > > traffic cross machine, no?
> > > >
> > > > I think we all agree that wal per region isn't efficient on today's
> > > > spinning hard drive world where we are limited to a relatively low
> > budget
> > > > or seeks (though may be better in the future with SSD's).
> > > >
> > >
> > > WALpr makes sense in fully SSD world and if hdfs had journaling for
> > writes.
> > > I don't think anybody
> > > is working on this yet.
> >
> >
> > what do you mean by journaling for writes?  do you mean where sync
> > operations update length at the nn on every call?
> >
>
> I think hdfs guys were using "super sync" for referring to that. I was
> referring to
> journaling file system (
> http://en.wikipedia.org/wiki/Journaling_file_system)
> where the writes to
> multiple files are persisted to a journal disk so that you do not pay the
> constant seeks for writing to
> a lot of files (for regions wals) in parallel.
>
>
>

Wait, we have a system that provides the ability to write data for a bunch
of buckets to a particular disk before rewriting them to others in a split
out read optimized from..
Isn't this what HBase and its HLog basically provides?  :)

Joking aside, can you give a quick example of the semantics it would have
so I can grok what you are talking about?

Jon

--
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// [EMAIL PROTECTED]
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB