|
|
-
Does HBase do in-memory replication of rows?
MauMau 2010-05-08, 12:16
Hello, I'm comparing HBase and Cassandra, which I think are the most promising distributed key-value stores, to determine which one to choose for the future OLTP and data analysis. I found the following benchmark report by Yahoo! Research which evalutes HBase, Cassandra, PNUTS, and sharded MySQL. http://www.brianfrankcooper.net/pubs/ycsb-v4.pdfhttp://www.brianfrankcooper.net/pubs/ycsb.pdfThe above report refers to HBase 0.20.3. Reading this and HBase's documentation, two questions about load balancing and replication have risen. Could anyone give me any information to help solve these questions? [Q2] replication Does HBase perform in-memory replication of rows like Cassandra? Does HBase sync updates to disk before returing success to clients? According to the following paragraph in HBase design overview, HBase syncs writes. ---------------------------------------- Write Requests When a write request is received, it is first written to a write-ahead log called a HLog. All write requests for every region the region server is serving are written to the same HLog. Once the request has been written to the HLog, the result of changes is stored in an in-memory cache called the Memcache. There is one Memcache for each Store. ---------------------------------------- The source code of Put class appear to show the above (though I don't understand the server-side code yet): private boolean writeToWAL = true; However, Yahoo's report writes as follows. Is this incorrect? What is in-memory replication? I know HBase relies on HDFS to replicate data on the storage, but not in memory. ---------------------------------------- For Cassandra, sharded MySQL and PNUTS, all updates were synched to disk before returning to the client. HBase does not sync to disk, but relies on in-memory replication across multiple servers for durability; this increases write throughput and reduces latency, but can result in data loss on failure. ---------------------------------------- Maumau
-
Re: Does HBase do in-memory replication of rows?
Amandeep Khurana 2010-05-08, 21:39
HBase does not do in-memory replication. Your data goes into a region, which has only one instance. Writes go to the write ahead log first, which is written to the disk. However, since HDFS doesnt yet have a fully performing flush functionality, there is a chance of losing the chunk of data. The next release of HBase will guarantee data durability since by then the flush functionality would be fully working. Regarding replication - the difference between Cassandra and HBase is that when you do a write in Cassandra, it doesnt return unless it has written to W nodes, which is configurable. In case of HBase, the replication is taken care of by the filesystem (HDFS). When the region is flushed to the disk, HDFS replicates the HFiles (in which the data for the regions is stored). For more details of the working, read the Bigtable paper and http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. 2010/5/8 MauMau <[EMAIL PROTECTED]> > Hello, > > I'm comparing HBase and Cassandra, which I think are the most promising > distributed key-value stores, to determine which one to choose for the > future OLTP and data analysis. > I found the following benchmark report by Yahoo! Research which evalutes > HBase, Cassandra, PNUTS, and sharded MySQL. > > http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf> http://www.brianfrankcooper.net/pubs/ycsb.pdf> > The above report refers to HBase 0.20.3. > Reading this and HBase's documentation, two questions about load balancing > and replication have risen. Could anyone give me any information to help > solve these questions? > > [Q2] replication > Does HBase perform in-memory replication of rows like Cassandra? > Does HBase sync updates to disk before returing success to clients? > > According to the following paragraph in HBase design overview, HBase syncs > writes. > > ---------------------------------------- > Write Requests > When a write request is received, it is first written to a write-ahead log > called a HLog. All write requests for every region the region server is > serving are written to the same HLog. Once the request has been written to > the HLog, the result of changes is stored in an in-memory cache called the > Memcache. There is one Memcache for each Store. > ---------------------------------------- > > The source code of Put class appear to show the above (though I don't > understand the server-side code yet): > > private boolean writeToWAL = true; > > However, Yahoo's report writes as follows. Is this incorrect? What is > in-memory replication? I know HBase relies on HDFS to replicate data on the > storage, but not in memory. > > ---------------------------------------- > For Cassandra, sharded MySQL and PNUTS, all updates were > synched to disk before returning to the client. HBase does > not sync to disk, but relies on in-memory replication across > multiple servers for durability; this increases write throughput > and reduces latency, but can result in data loss on failure. > ---------------------------------------- > > Maumau > >
-
Re: Does HBase do in-memory replication of rows?
Ryan Rawson 2010-05-08, 22:10
For more architectural details of HBase, check out the bigtable paper, it's fairly detailed, short and accessible. On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > HBase does not do in-memory replication. Your data goes into a region, which > has only one instance. Writes go to the write ahead log first, which is > written to the disk. However, since HDFS doesnt yet have a fully performing > flush functionality, there is a chance of losing the chunk of data. The next > release of HBase will guarantee data durability since by then the flush > functionality would be fully working. > > Regarding replication - the difference between Cassandra and HBase is that > when you do a write in Cassandra, it doesnt return unless it has written to > W nodes, which is configurable. In case of HBase, the replication is taken > care of by the filesystem (HDFS). When the region is flushed to the disk, > HDFS replicates the HFiles (in which the data for the regions is stored). > For more details of the working, read the Bigtable paper and > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. > > > 2010/5/8 MauMau <[EMAIL PROTECTED]> > >> Hello, >> >> I'm comparing HBase and Cassandra, which I think are the most promising >> distributed key-value stores, to determine which one to choose for the >> future OLTP and data analysis. >> I found the following benchmark report by Yahoo! Research which evalutes >> HBase, Cassandra, PNUTS, and sharded MySQL. >> >> http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf>> http://www.brianfrankcooper.net/pubs/ycsb.pdf>> >> The above report refers to HBase 0.20.3. >> Reading this and HBase's documentation, two questions about load balancing >> and replication have risen. Could anyone give me any information to help >> solve these questions? >> >> [Q2] replication >> Does HBase perform in-memory replication of rows like Cassandra? >> Does HBase sync updates to disk before returing success to clients? >> >> According to the following paragraph in HBase design overview, HBase syncs >> writes. >> >> ---------------------------------------- >> Write Requests >> When a write request is received, it is first written to a write-ahead log >> called a HLog. All write requests for every region the region server is >> serving are written to the same HLog. Once the request has been written to >> the HLog, the result of changes is stored in an in-memory cache called the >> Memcache. There is one Memcache for each Store. >> ---------------------------------------- >> >> The source code of Put class appear to show the above (though I don't >> understand the server-side code yet): >> >> private boolean writeToWAL = true; >> >> However, Yahoo's report writes as follows. Is this incorrect? What is >> in-memory replication? I know HBase relies on HDFS to replicate data on the >> storage, but not in memory. >> >> ---------------------------------------- >> For Cassandra, sharded MySQL and PNUTS, all updates were >> synched to disk before returning to the client. HBase does >> not sync to disk, but relies on in-memory replication across >> multiple servers for durability; this increases write throughput >> and reduces latency, but can result in data loss on failure. >> ---------------------------------------- >> >> Maumau >> >> >
-
Re: Does HBase do in-memory replication of rows?
MauMau 2010-05-09, 01:10
Thanks Amandeep and Ryan, I could make sure that unlike Cassandra, HBase does not do in-memory replication. So, the paragraph below in Yahoo's report is partly incorrect: Cassandra, sharded MySQL and PNUTS, all updates were synched to disk before returning to the client. HBase does not sync to disk, but relies on in-memory replication across multiple servers for durability; this increases write throughput and reduces latency, but can result in data loss on failure. Maumau ----- Original Message ----- From: "Ryan Rawson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, May 09, 2010 7:10 AM Subject: Re: Does HBase do in-memory replication of rows? For more architectural details of HBase, check out the bigtable paper, it's fairly detailed, short and accessible. On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > HBase does not do in-memory replication. Your data goes into a region, > which > has only one instance. Writes go to the write ahead log first, which is > written to the disk. However, since HDFS doesnt yet have a fully > performing > flush functionality, there is a chance of losing the chunk of data. The > next > release of HBase will guarantee data durability since by then the flush > functionality would be fully working. > > Regarding replication - the difference between Cassandra and HBase is that > when you do a write in Cassandra, it doesnt return unless it has written > to > W nodes, which is configurable. In case of HBase, the replication is taken > care of by the filesystem (HDFS). When the region is flushed to the disk, > HDFS replicates the HFiles (in which the data for the regions is stored). > For more details of the working, read the Bigtable paper and > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.
-
Re: Does HBase do in-memory replication of rows?
Ryan Rawson 2010-05-09, 01:12
Yes that section is very misleading. What is actually happening is like so: - Every time you write to HBase the data is written to a Write Ahead Log. - If there is a regionserver failure the log is replayed to recover the data - Due to a HDFS bug, the data in the most recent file, which is rotated at 64MB by default, is lost. The other good news is that serious effort is being undertaken to push a version of HDFS without this bug. Hopefully within a week people will be able to download a version of HDFS and not run into this situation. -ryan On Sat, May 8, 2010 at 6:10 PM, MauMau <[EMAIL PROTECTED]> wrote: > Thanks Amandeep and Ryan, > > I could make sure that unlike Cassandra, HBase does not do in-memory > replication. So, the paragraph below in Yahoo's report is partly incorrect: > > Cassandra, sharded MySQL and PNUTS, all updates were > synched to disk before returning to the client. HBase does > not sync to disk, but relies on in-memory replication across > multiple servers for durability; this increases write throughput > and reduces latency, but can result in data loss on failure. > > Maumau > > > ----- Original Message ----- From: "Ryan Rawson" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Sunday, May 09, 2010 7:10 AM > Subject: Re: Does HBase do in-memory replication of rows? > > > For more architectural details of HBase, check out the bigtable paper, > it's fairly detailed, short and accessible. > > On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >> >> HBase does not do in-memory replication. Your data goes into a region, >> which >> has only one instance. Writes go to the write ahead log first, which is >> written to the disk. However, since HDFS doesnt yet have a fully >> performing >> flush functionality, there is a chance of losing the chunk of data. The >> next >> release of HBase will guarantee data durability since by then the flush >> functionality would be fully working. >> >> Regarding replication - the difference between Cassandra and HBase is that >> when you do a write in Cassandra, it doesnt return unless it has written >> to >> W nodes, which is configurable. In case of HBase, the replication is taken >> care of by the filesystem (HDFS). When the region is flushed to the disk, >> HDFS replicates the HFiles (in which the data for the regions is stored). >> For more details of the working, read the Bigtable paper and >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. > >
-
Re: Does HBase do in-memory replication of rows?
Todd Lipcon 2010-05-09, 03:28
I think the point they were trying to make in the YCSB paper is that, even with the WAL and hflush(), the data does not get synced to disk. hflush() only ensures that the WAL data has made it to three datanodes' OS caches, but doesn't actually guarantee anything is on physical media. I agree it's not clearly articulated, but that's what's happening. The API that will cause fsync() on the DNs is called hsync() and has not been written yet. -Todd On Sat, May 8, 2010 at 6:12 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: > > Yes that section is very misleading. What is actually happening is like so: > > - Every time you write to HBase the data is written to a Write Ahead Log. > - If there is a regionserver failure the log is replayed to recover the data > - Due to a HDFS bug, the data in the most recent file, which is > rotated at 64MB by default, is lost. > > The other good news is that serious effort is being undertaken to push > a version of HDFS without this bug. Hopefully within a week people > will be able to download a version of HDFS and not run into this > situation. > > -ryan > > On Sat, May 8, 2010 at 6:10 PM, MauMau <[EMAIL PROTECTED]> wrote: > > Thanks Amandeep and Ryan, > > > > I could make sure that unlike Cassandra, HBase does not do in-memory > > replication. So, the paragraph below in Yahoo's report is partly incorrect: > > > > Cassandra, sharded MySQL and PNUTS, all updates were > > synched to disk before returning to the client. HBase does > > not sync to disk, but relies on in-memory replication across > > multiple servers for durability; this increases write throughput > > and reduces latency, but can result in data loss on failure. > > > > Maumau > > > > > > ----- Original Message ----- From: "Ryan Rawson" <[EMAIL PROTECTED]> > > To: <[EMAIL PROTECTED]> > > Sent: Sunday, May 09, 2010 7:10 AM > > Subject: Re: Does HBase do in-memory replication of rows? > > > > > > For more architectural details of HBase, check out the bigtable paper, > > it's fairly detailed, short and accessible. > > > > On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: > >> > >> HBase does not do in-memory replication. Your data goes into a region, > >> which > >> has only one instance. Writes go to the write ahead log first, which is > >> written to the disk. However, since HDFS doesnt yet have a fully > >> performing > >> flush functionality, there is a chance of losing the chunk of data. The > >> next > >> release of HBase will guarantee data durability since by then the flush > >> functionality would be fully working. > >> > >> Regarding replication - the difference between Cassandra and HBase is that > >> when you do a write in Cassandra, it doesnt return unless it has written > >> to > >> W nodes, which is configurable. In case of HBase, the replication is taken > >> care of by the filesystem (HDFS). When the region is flushed to the disk, > >> HDFS replicates the HFiles (in which the data for the regions is stored). > >> For more details of the working, read the Bigtable paper and > >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. > > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Does HBase do in-memory replication of rows?
Ryan Rawson 2010-05-09, 03:35
I think HDFS-200 does call fflush, so it will get to the OS buffers... -ryan On Sat, May 8, 2010 at 8:28 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > I think the point they were trying to make in the YCSB paper is that, > even with the WAL and hflush(), the data does not get synced to disk. > hflush() only ensures that the WAL data has made it to three > datanodes' OS caches, but doesn't actually guarantee anything is on > physical media. > I agree it's not clearly articulated, but that's what's happening. The > API that will cause fsync() on the DNs is called hsync() and has not > been written yet. > -Todd > On Sat, May 8, 2010 at 6:12 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote: >> >> Yes that section is very misleading. What is actually happening is like so: >> >> - Every time you write to HBase the data is written to a Write Ahead Log. >> - If there is a regionserver failure the log is replayed to recover the data >> - Due to a HDFS bug, the data in the most recent file, which is >> rotated at 64MB by default, is lost. >> >> The other good news is that serious effort is being undertaken to push >> a version of HDFS without this bug. Hopefully within a week people >> will be able to download a version of HDFS and not run into this >> situation. >> >> -ryan >> >> On Sat, May 8, 2010 at 6:10 PM, MauMau <[EMAIL PROTECTED]> wrote: >> > Thanks Amandeep and Ryan, >> > >> > I could make sure that unlike Cassandra, HBase does not do in-memory >> > replication. So, the paragraph below in Yahoo's report is partly incorrect: >> > >> > Cassandra, sharded MySQL and PNUTS, all updates were >> > synched to disk before returning to the client. HBase does >> > not sync to disk, but relies on in-memory replication across >> > multiple servers for durability; this increases write throughput >> > and reduces latency, but can result in data loss on failure. >> > >> > Maumau >> > >> > >> > ----- Original Message ----- From: "Ryan Rawson" <[EMAIL PROTECTED]> >> > To: <[EMAIL PROTECTED]> >> > Sent: Sunday, May 09, 2010 7:10 AM >> > Subject: Re: Does HBase do in-memory replication of rows? >> > >> > >> > For more architectural details of HBase, check out the bigtable paper, >> > it's fairly detailed, short and accessible. >> > >> > On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <[EMAIL PROTECTED]> wrote: >> >> >> >> HBase does not do in-memory replication. Your data goes into a region, >> >> which >> >> has only one instance. Writes go to the write ahead log first, which is >> >> written to the disk. However, since HDFS doesnt yet have a fully >> >> performing >> >> flush functionality, there is a chance of losing the chunk of data. The >> >> next >> >> release of HBase will guarantee data durability since by then the flush >> >> functionality would be fully working. >> >> >> >> Regarding replication - the difference between Cassandra and HBase is that >> >> when you do a write in Cassandra, it doesnt return unless it has written >> >> to >> >> W nodes, which is configurable. In case of HBase, the replication is taken >> >> care of by the filesystem (HDFS). When the region is flushed to the disk, >> >> HDFS replicates the HFiles (in which the data for the regions is stored). >> >> For more details of the working, read the Bigtable paper and >> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. >> > >> > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
-
Re: Does HBase do in-memory replication of rows?
Andrew Purtell 2010-05-09, 17:18
Others have followed up on the central question, which is about durability, and have pointed out that the text is misleading. However more generally regarding the question "Does HBase do in-memory replication of rows?": HBase will have a replication feature in the next release independent of HDFS layer data block replication: HBASE-1295: https://issues.apache.org/jira/browse/HBASE-1295This is cluster-to-cluster replication, at the HBase layer, and at a finer granularity than the row. HBase may also in the future evolve an optional extension to the BigTable architecture: HBASE-2357: https://issues.apache.org/jira/browse/HBASE-2357and this I think also meets the definition of in-memory replication. While HBASE-2357 talks about availability, I see this as a means for offering higher read scalability for some use cases that can accept a relaxation of HBase's ACID guarantees. So an answer to "Does HBase do in-memory replication of rows?" is also in part: Actually we might do that, independent of providing durability by other means. - Andy > From: MauMau > Subject: Does HBase do in-memory replication of rows? > To: [EMAIL PROTECTED] > Date: Saturday, May 8, 2010, 5:16 AM > Hello, > > I'm comparing HBase and Cassandra, which I think are the > most promising distributed key-value stores, to determine > which one to choose for the future OLTP and data analysis. > I found the following benchmark report by Yahoo! Research > which evalutes HBase, Cassandra, PNUTS, and sharded MySQL. > > http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf> http://www.brianfrankcooper.net/pubs/ycsb.pdf> > The above report refers to HBase 0.20.3. > Reading this and HBase's documentation, two questions about > load balancing and replication have risen. Could anyone give > me any information to help solve these questions? > > [Q2] replication > Does HBase perform in-memory replication of rows like > Cassandra? > Does HBase sync updates to disk before returing success to > clients? > > According to the following paragraph in HBase design > overview, HBase syncs writes. > > ---------------------------------------- > Write Requests > When a write request is received, it is first written to a > write-ahead log called a HLog. All write requests for every > region the region server is serving are written to the same > HLog. Once the request has been written to the HLog, the > result of changes is stored in an in-memory cache called the > Memcache. There is one Memcache for each Store. > ---------------------------------------- > > The source code of Put class appear to show the above > (though I don't understand the server-side code yet): > > private boolean writeToWAL = true; > > However, Yahoo's report writes as follows. Is this > incorrect? What is in-memory replication? I know HBase > relies on HDFS to replicate data on the storage, but not in > memory. > > ---------------------------------------- > For Cassandra, sharded MySQL and PNUTS, all updates were > synched to disk before returning to the client. HBase does > not sync to disk, but relies on in-memory replication > across > multiple servers for durability; this increases write > throughput > and reduces latency, but can result in data loss on > failure. > ---------------------------------------- > > Maumau > >
|
|