|
Mohit Anchlia
2011-07-07, 18:12
Andrew Purtell
2011-07-07, 18:22
Mohit Anchlia
2011-07-07, 18:53
Himanshu Vashishtha
2011-07-07, 18:59
Mohit Anchlia
2011-07-07, 19:05
Stack
2011-07-07, 19:09
Andrew Purtell
2011-07-07, 19:11
Mohit Anchlia
2011-07-07, 19:30
Doug Meil
2011-07-07, 19:38
Andrew Purtell
2011-07-07, 20:53
Mohit Anchlia
2011-07-07, 21:01
Buttler, David
2011-07-07, 21:43
Mohit Anchlia
2011-07-07, 22:02
Andrew Purtell
2011-07-07, 22:12
Mohit Anchlia
2011-07-07, 22:17
Arvind Jayaprakash
2011-07-11, 13:34
Andrew Purtell
2011-07-11, 16:20
Ted Dunning
2011-07-11, 16:47
Joey Echeverria
2011-07-11, 18:22
Ted Dunning
2011-07-11, 18:57
Joey Echeverria
2011-07-11, 19:37
M. C. Srivas
2011-07-11, 19:48
Luke Lu
2011-07-11, 20:33
Ted Dunning
2011-07-11, 20:47
Stack
2011-07-12, 03:45
Ted Dunning
2011-07-12, 04:54
Doug Meil
2011-07-07, 19:13
|
-
Hbase performance with HDFSMohit Anchlia 2011-07-07, 18:12
I've been trying to understand how Hbase can provide good performance
using HDFS when purpose of HDFS is sequential large block sizes which is inherently different than of Hbase where it's more random and row sizes might be very small. I am reading this but doesn't answer my question. It does say that HFile block size is different but how it really works with HDFS is what I am trying to understand. http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html +
Mohit Anchlia 2011-07-07, 18:12
-
Re: Hbase performance with HDFSAndrew Purtell 2011-07-07, 18:22
Hi Mohit,
Start here: http://labs.google.com/papers/bigtable.html Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Mohit Anchlia <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, July 7, 2011 11:12 AM >Subject: Hbase performance with HDFS > >I've been trying to understand how Hbase can provide good performance >using HDFS when purpose of HDFS is sequential large block sizes which >is inherently different than of Hbase where it's more random and row >sizes might be very small. > >I am reading this but doesn't answer my question. It does say that >HFile block size is different but how it really works with HDFS is >what I am trying to understand. > >http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html > > > +
Andrew Purtell 2011-07-07, 18:22
-
Re: Hbase performance with HDFSMohit Anchlia 2011-07-07, 18:53
I have looked at bigtable and it's ssTables etc. But my question is
directly related to how it's used with HDFS. HDFS recommends large files, bigger blocks, write once and read many sequential reads. But accessing small rows and writing small rows is more random and different than inherent design of HDFS. How do these 2 go together and is able to provide performance. On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Hi Mohit, > > Start here: http://labs.google.com/papers/bigtable.html > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >>________________________________ >>From: Mohit Anchlia <[EMAIL PROTECTED]> >>To: [EMAIL PROTECTED] >>Sent: Thursday, July 7, 2011 11:12 AM >>Subject: Hbase performance with HDFS >> >>I've been trying to understand how Hbase can provide good performance >>using HDFS when purpose of HDFS is sequential large block sizes which >>is inherently different than of Hbase where it's more random and row >>sizes might be very small. >> >>I am reading this but doesn't answer my question. It does say that >>HFile block size is different but how it really works with HDFS is >>what I am trying to understand. >> >>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >> >> >> +
Mohit Anchlia 2011-07-07, 18:53
-
Re: Hbase performance with HDFSHimanshu Vashishtha 2011-07-07, 18:59
Mohit,
just like how SSTables are stored on GFS? BigTable sstable => HBase HFile. Does this help? Himanshu On Thu, Jul 7, 2011 at 12:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > I have looked at bigtable and it's ssTables etc. But my question is > directly related to how it's used with HDFS. HDFS recommends large > files, bigger blocks, write once and read many sequential reads. But > accessing small rows and writing small rows is more random and > different than inherent design of HDFS. How do these 2 go together and > is able to provide performance. > > On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > Hi Mohit, > > > > Start here: http://labs.google.com/papers/bigtable.html > > > > Best regards, > > > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > > > >>________________________________ > >>From: Mohit Anchlia <[EMAIL PROTECTED]> > >>To: [EMAIL PROTECTED] > >>Sent: Thursday, July 7, 2011 11:12 AM > >>Subject: Hbase performance with HDFS > >> > >>I've been trying to understand how Hbase can provide good performance > >>using HDFS when purpose of HDFS is sequential large block sizes which > >>is inherently different than of Hbase where it's more random and row > >>sizes might be very small. > >> > >>I am reading this but doesn't answer my question. It does say that > >>HFile block size is different but how it really works with HDFS is > >>what I am trying to understand. > >> > >>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html > >> > >> > >> > +
Himanshu Vashishtha 2011-07-07, 18:59
-
Re: Hbase performance with HDFSMohit Anchlia 2011-07-07, 19:05
I understand that but like I mentioned properties of HDFS is lot
different than the requirements of Hbase in terms of access patterns, row sizes, write size, lot of updates, deletes etc. So I was trying to understand how they are able to work together and also give desired performance. On Thu, Jul 7, 2011 at 11:59 AM, Himanshu Vashishtha <[EMAIL PROTECTED]> wrote: > Mohit, > just like how SSTables are stored on GFS? > BigTable sstable => HBase HFile. > > Does this help? > Himanshu > > On Thu, Jul 7, 2011 at 12:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: > >> I have looked at bigtable and it's ssTables etc. But my question is >> directly related to how it's used with HDFS. HDFS recommends large >> files, bigger blocks, write once and read many sequential reads. But >> accessing small rows and writing small rows is more random and >> different than inherent design of HDFS. How do these 2 go together and >> is able to provide performance. >> >> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> >> wrote: >> > Hi Mohit, >> > >> > Start here: http://labs.google.com/papers/bigtable.html >> > >> > Best regards, >> > >> > >> > - Andy >> > >> > Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> > >> > >> >>________________________________ >> >>From: Mohit Anchlia <[EMAIL PROTECTED]> >> >>To: [EMAIL PROTECTED] >> >>Sent: Thursday, July 7, 2011 11:12 AM >> >>Subject: Hbase performance with HDFS >> >> >> >>I've been trying to understand how Hbase can provide good performance >> >>using HDFS when purpose of HDFS is sequential large block sizes which >> >>is inherently different than of Hbase where it's more random and row >> >>sizes might be very small. >> >> >> >>I am reading this but doesn't answer my question. It does say that >> >>HFile block size is different but how it really works with HDFS is >> >>what I am trying to understand. >> >> >> >>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >> >> >> >> >> >> >> > +
Mohit Anchlia 2011-07-07, 19:05
-
Re: Hbase performance with HDFSStack 2011-07-07, 19:09
Writing, we dump files out that are close to hdfs block size when we flush.
Reading, the files have an index so we'll know where to seek to in hdfs to find a particular value. We then pull in a block of the hdfs file -- 64k (not 64MB) -- into memory and will keep it in an LRU block cache in case subsequent reads are for same block. We always read 64k from HDFS (though this is configurable and some report better random read perf when smaller than 64k blocks are used). What else do you want to know? St.Ack On Thu, Jul 7, 2011 at 12:05 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > I understand that but like I mentioned properties of HDFS is lot > different than the requirements of Hbase in terms of access patterns, > row sizes, write size, lot of updates, deletes etc. So I was trying to > understand how they are able to work together and also give desired > performance. > > On Thu, Jul 7, 2011 at 11:59 AM, Himanshu Vashishtha > <[EMAIL PROTECTED]> wrote: >> Mohit, >> just like how SSTables are stored on GFS? >> BigTable sstable => HBase HFile. >> >> Does this help? >> Himanshu >> >> On Thu, Jul 7, 2011 at 12:53 PM, Mohit Anchlia <[EMAIL PROTECTED]>wrote: >> >>> I have looked at bigtable and it's ssTables etc. But my question is >>> directly related to how it's used with HDFS. HDFS recommends large >>> files, bigger blocks, write once and read many sequential reads. But >>> accessing small rows and writing small rows is more random and >>> different than inherent design of HDFS. How do these 2 go together and >>> is able to provide performance. >>> >>> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> >>> wrote: >>> > Hi Mohit, >>> > >>> > Start here: http://labs.google.com/papers/bigtable.html >>> > >>> > Best regards, >>> > >>> > >>> > - Andy >>> > >>> > Problems worthy of attack prove their worth by hitting back. - Piet Hein >>> (via Tom White) >>> > >>> > >>> >>________________________________ >>> >>From: Mohit Anchlia <[EMAIL PROTECTED]> >>> >>To: [EMAIL PROTECTED] >>> >>Sent: Thursday, July 7, 2011 11:12 AM >>> >>Subject: Hbase performance with HDFS >>> >> >>> >>I've been trying to understand how Hbase can provide good performance >>> >>using HDFS when purpose of HDFS is sequential large block sizes which >>> >>is inherently different than of Hbase where it's more random and row >>> >>sizes might be very small. >>> >> >>> >>I am reading this but doesn't answer my question. It does say that >>> >>HFile block size is different but how it really works with HDFS is >>> >>what I am trying to understand. >>> >> >>> >>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >>> >> >>> >> >>> >> >>> >> > +
Stack 2011-07-07, 19:09
-
Re: Hbase performance with HDFSAndrew Purtell 2011-07-07, 19:11
Some thoughts off the top of my head. Lars' architecture material might/should cover this too. Pretty sure his book will.
Regarding reads: One does not have to read a whole HDFS block. You can request arbitrary byte ranges with the block, via positioned reads. (It is true also that HDFS can be improved for better random reading performance in ways not necessarily yet committed to trunk or especially a 0.20.x branch with append support for HBase. See https://issues.apache.org/jira/browse/HDFS-1323) HBase holds indexes to store files in HDFS in memory. We also open all store files at the HDFS layer and stash those references. Additionally, users can specify the use of bloom filters to improve query time performance through wholesale skipping of HFile reads if they are known not to contain data that satisfies the query. Bloom filters are held in memory as well. So with indexes resident in memory when handling Gets we know the byte ranges within HDFS block(s) that contain the data of interest. With positioned reads we retrieve only those bytes from a DataNode. With optional bloomfilters we avoid whole HFiles entirely. Regarding writes: I think you should consult the bigtable paper again if you are still asking about the write path. The database is log structured. Writes are accumulated in memory, and flushed all at once. Later flush files are compacted as needed, because as you point out GFS and HDFS are optimized for streaming sequential reads and writes. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Mohit Anchlia <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED]; Andrew Purtell <[EMAIL PROTECTED]> >Sent: Thursday, July 7, 2011 11:53 AM >Subject: Re: Hbase performance with HDFS > >I have looked at bigtable and it's ssTables etc. But my question is >directly related to how it's used with HDFS. HDFS recommends large >files, bigger blocks, write once and read many sequential reads. But >accessing small rows and writing small rows is more random and >different than inherent design of HDFS. How do these 2 go together and >is able to provide performance. > >On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> Hi Mohit, >> >> Start here: http://labs.google.com/papers/bigtable.html >> >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >> >> >>>________________________________ >>>From: Mohit Anchlia <[EMAIL PROTECTED]> >>>To: [EMAIL PROTECTED] >>>Sent: Thursday, July 7, 2011 11:12 AM >>>Subject: Hbase performance with HDFS >>> >>>I've been trying to understand how Hbase can provide good performance >>>using HDFS when purpose of HDFS is sequential large block sizes which >>>is inherently different than of Hbase where it's more random and row >>>sizes might be very small. >>> >>>I am reading this but doesn't answer my question. It does say that >>>HFile block size is different but how it really works with HDFS is >>>what I am trying to understand. >>> >>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >>> >>> >>> > > > +
Andrew Purtell 2011-07-07, 19:11
-
Re: Hbase performance with HDFSMohit Anchlia 2011-07-07, 19:30
Thanks that helps! Just few more questions:
You mentioned about compactions, when do those occur and what triggers them? Does it cause additional space usage when that happens, if it does it would mean you always need to have much more disk then you really need. Since HDFS is mostly write once how are updates/deletes handled? Is Hbase also suitable for Blobs? On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Some thoughts off the top of my head. Lars' architecture material > might/should cover this too. Pretty sure his book will. > Regarding reads: > One does not have to read a whole HDFS block. You can request arbitrary byte > ranges with the block, via positioned reads. (It is true also that HDFS can > be improved for better random reading performance in ways not necessarily > yet committed to trunk or especially a 0.20.x branch with append support for > HBase. See https://issues.apache.org/jira/browse/HDFS-1323) > HBase holds indexes to store files in HDFS in memory. We also open all store > files at the HDFS layer and stash those references. Additionally, users can > specify the use of bloom filters to improve query time performance through > wholesale skipping of HFile reads if they are known not to contain data that > satisfies the query. Bloom filters are held in memory as well. > So with indexes resident in memory when handling Gets we know the byte > ranges within HDFS block(s) that contain the data of interest. With > positioned reads we retrieve only those bytes from a DataNode. With optional > bloomfilters we avoid whole HFiles entirely. > Regarding writes: > I think you should consult the bigtable paper again if you are still asking > about the write path. The database is log structured. Writes are accumulated > in memory, and flushed all at once. Later flush files are compacted as > needed, because as you point out GFS and HDFS are optimized for streaming > sequential reads and writes. > > Best regards, > > - Andy > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > ________________________________ > From: Mohit Anchlia <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Andrew Purtell <[EMAIL PROTECTED]> > Sent: Thursday, July 7, 2011 11:53 AM > Subject: Re: Hbase performance with HDFS > > I have looked at bigtable and it's ssTables etc. But my question is > directly related to how it's used with HDFS. HDFS recommends large > files, bigger blocks, write once and read many sequential reads. But > accessing small rows and writing small rows is more random and > different than inherent design of HDFS. How do these 2 go together and > is able to provide performance. > > On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> Hi Mohit, >> >> Start here: http://labs.google.com/papers/bigtable.html >> >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >>>________________________________ >>>From: Mohit Anchlia <[EMAIL PROTECTED]> >>>To: [EMAIL PROTECTED] >>>Sent: Thursday, July 7, 2011 11:12 AM >>>Subject: Hbase performance with HDFS >>> >>>I've been trying to understand how Hbase can provide good performance >>>using HDFS when purpose of HDFS is sequential large block sizes which >>>is inherently different than of Hbase where it's more random and row >>>sizes might be very small. >>> >>>I am reading this but doesn't answer my question. It does say that >>>HFile block size is different but how it really works with HDFS is >>>what I am trying to understand. >>> >>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >>> >>> >>> > > > +
Mohit Anchlia 2011-07-07, 19:30
-
Re: Hbase performance with HDFSDoug Meil 2011-07-07, 19:38
Hi there-
You should read the architecture section... http://hbase.apache.org/book.html#architecture re: "blobs" http://hbase.apache.org/book.html#supported.datatypes On 7/7/11 3:30 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote: >Thanks that helps! Just few more questions: > >You mentioned about compactions, when do those occur and what triggers >them? Does it cause additional space usage when that happens, if it >does it would mean you always need to have much more disk then you >really need. > >Since HDFS is mostly write once how are updates/deletes handled? > >Is Hbase also suitable for Blobs? > >On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[EMAIL PROTECTED]> >wrote: >> Some thoughts off the top of my head. Lars' architecture material >> might/should cover this too. Pretty sure his book will. >> Regarding reads: >> One does not have to read a whole HDFS block. You can request arbitrary >>byte >> ranges with the block, via positioned reads. (It is true also that HDFS >>can >> be improved for better random reading performance in ways not >>necessarily >> yet committed to trunk or especially a 0.20.x branch with append >>support for >> HBase. See https://issues.apache.org/jira/browse/HDFS-1323) >> HBase holds indexes to store files in HDFS in memory. We also open all >>store >> files at the HDFS layer and stash those references. Additionally, users >>can >> specify the use of bloom filters to improve query time performance >>through >> wholesale skipping of HFile reads if they are known not to contain data >>that >> satisfies the query. Bloom filters are held in memory as well. >> So with indexes resident in memory when handling Gets we know the byte >> ranges within HDFS block(s) that contain the data of interest. With >> positioned reads we retrieve only those bytes from a DataNode. With >>optional >> bloomfilters we avoid whole HFiles entirely. >> Regarding writes: >> I think you should consult the bigtable paper again if you are still >>asking >> about the write path. The database is log structured. Writes are >>accumulated >> in memory, and flushed all at once. Later flush files are compacted as >> needed, because as you point out GFS and HDFS are optimized for >>streaming >> sequential reads and writes. >> >> Best regards, >> >> - Andy >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> ________________________________ >> From: Mohit Anchlia <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED]; Andrew Purtell <[EMAIL PROTECTED]> >> Sent: Thursday, July 7, 2011 11:53 AM >> Subject: Re: Hbase performance with HDFS >> >> I have looked at bigtable and it's ssTables etc. But my question is >> directly related to how it's used with HDFS. HDFS recommends large >> files, bigger blocks, write once and read many sequential reads. But >> accessing small rows and writing small rows is more random and >> different than inherent design of HDFS. How do these 2 go together and >> is able to provide performance. >> >> On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> >>wrote: >>> Hi Mohit, >>> >>> Start here: http://labs.google.com/papers/bigtable.html >>> >>> Best regards, >>> >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet >>>Hein >>> (via Tom White) >>> >>> >>>>________________________________ >>>>From: Mohit Anchlia <[EMAIL PROTECTED]> >>>>To: [EMAIL PROTECTED] >>>>Sent: Thursday, July 7, 2011 11:12 AM >>>>Subject: Hbase performance with HDFS >>>> >>>>I've been trying to understand how Hbase can provide good performance >>>>using HDFS when purpose of HDFS is sequential large block sizes which >>>>is inherently different than of Hbase where it's more random and row >>>>sizes might be very small. >>>> >>>>I am reading this but doesn't answer my question. It does say that >>>>HFile block size is different but how it really works with HDFS is >>>>what I am trying to understand. >>>> >>>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html +
Doug Meil 2011-07-07, 19:38
-
Re: Hbase performance with HDFSAndrew Purtell 2011-07-07, 20:53
> You mentioned about compactions, when do those occur and what triggers
> them? Compactions are triggered by an algorithm that monitors the number of flush files in a store and the size of them, and is configurable in several dimensions. > Does it cause additional space usage when that happens Yes. > if it > does it would mean you always need to have much more disk then you > really need. Not all regions are compacted at once. Each region by default is constrained to 256 MB. Not all regions will hold the full amount of data. The result is not a perfect copy (doubling) if some data has been deleted or are associated with TTLs that have expired. The merge sorted result is moved into place and the old files are deleted as soon as the compaction completes. So how much more is "much more"? You can't write to any kind of data store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... > Since HDFS is mostly write once how are updates/deletes handled? Not mostly, only write once. From the BigTable paper, section 5.3: "A valid read operation is executed on a merged view of the sequence of SSTables and the memtable. Since the SSTables and the memtable are lexicographically sorted data structures, the merged view can be formed efficiently." So what this means is all the store files and the memstore serve effectively as change logs sorted in reverse chronological order. Deletes are just another write, but one that writes tombstones "covering" data with older timestamps. When serving queries, HBase searches store files back in time until it finds data at the coordinates requested or a tombstone. The process of compaction not only merge sorts a bunch of accumulated store files (from flushes) into fewer store files (or one) for read efficiency, it also performs housekeeping, dropping data "covered" by the delete tombstones. Incidentally this is also how TTLs are supported: expired values are dropped as well. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Mohit Anchlia <[EMAIL PROTECTED]> >To: Andrew Purtell <[EMAIL PROTECTED]> >Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >Sent: Thursday, July 7, 2011 12:30 PM >Subject: Re: Hbase performance with HDFS > >Thanks that helps! Just few more questions: > >You mentioned about compactions, when do those occur and what triggers >them? Does it cause additional space usage when that happens, if it >does it would mean you always need to have much more disk then you >really need. > >Since HDFS is mostly write once how are updates/deletes handled? > >Is Hbase also suitable for Blobs? > >On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> Some thoughts off the top of my head. Lars' architecture material >> might/should cover this too. Pretty sure his book will. >> Regarding reads: >> One does not have to read a whole HDFS block. You can request arbitrary byte >> ranges with the block, via positioned reads. (It is true also that HDFS can >> be improved for better random reading performance in ways not necessarily >> yet committed to trunk or especially a 0.20.x branch with append support for >> HBase. See https://issues.apache.org/jira/browse/HDFS-1323) >> HBase holds indexes to store files in HDFS in memory. We also open all store >> files at the HDFS layer and stash those references. Additionally, users can >> specify the use of bloom filters to improve query time performance through >> wholesale skipping of HFile reads if they are known not to contain data that >> satisfies the query. Bloom filters are held in memory as well. >> So with indexes resident in memory when handling Gets we know the byte >> ranges within HDFS block(s) that contain the data of interest. With >> positioned reads we retrieve only those bytes from a DataNode. With optional >> bloomfilters we avoid whole HFiles entirely. >> Regarding writes: >> I think you should consult the bigtable paper again if you are still asking +
Andrew Purtell 2011-07-07, 20:53
-
Re: Hbase performance with HDFSMohit Anchlia 2011-07-07, 21:01
Thanks Andrew. Really helpful. I think I have one more question right
now :) Underneath HDFS replicates blocks by default 3. Not sure how it relates to HFile and compactions. When compaction occurs is it also happening on the replica blocks from other nodes? If not then how does it work when one node fails. On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> You mentioned about compactions, when do those occur and what triggers >> them? > > Compactions are triggered by an algorithm that monitors the number of flush files in a store and the size of them, and is configurable in several dimensions. > >> Does it cause additional space usage when that happens > > Yes. > >> if it >> does it would mean you always need to have much more disk then you >> really need. > > > Not all regions are compacted at once. Each region by default is constrained to 256 MB. Not all regions will hold the full amount of data. The result is not a perfect copy (doubling) if some data has been deleted or are associated with TTLs that have expired. The merge sorted result is moved into place and the old files are deleted as soon as the compaction completes. So how much more is "much more"? You can't write to any kind of data store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... > >> Since HDFS is mostly write once how are updates/deletes handled? > > > Not mostly, only write once. > > From the BigTable paper, section 5.3: "A valid read operation is executed on a merged view of the sequence of SSTables and the memtable. Since the SSTables and the memtable are lexicographically sorted data structures, the merged view can be formed efficiently." So what this means is all the store files and the memstore serve effectively as change logs sorted in reverse chronological order. > > Deletes are just another write, but one that writes tombstones "covering" data with older timestamps. > > When serving queries, HBase searches store files back in time until it finds data at the coordinates requested or a tombstone. > > The process of compaction not only merge sorts a bunch of accumulated store files (from flushes) into fewer store files (or one) for read efficiency, it also performs housekeeping, dropping data "covered" by the delete tombstones. Incidentally this is also how TTLs are supported: expired values are dropped as well. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >>________________________________ >>From: Mohit Anchlia <[EMAIL PROTECTED]> >>To: Andrew Purtell <[EMAIL PROTECTED]> >>Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >>Sent: Thursday, July 7, 2011 12:30 PM >>Subject: Re: Hbase performance with HDFS >> >>Thanks that helps! Just few more questions: >> >>You mentioned about compactions, when do those occur and what triggers >>them? Does it cause additional space usage when that happens, if it >>does it would mean you always need to have much more disk then you >>really need. >> >>Since HDFS is mostly write once how are updates/deletes handled? >> >>Is Hbase also suitable for Blobs? >> >>On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> Some thoughts off the top of my head. Lars' architecture material >>> might/should cover this too. Pretty sure his book will. >>> Regarding reads: >>> One does not have to read a whole HDFS block. You can request arbitrary byte >>> ranges with the block, via positioned reads. (It is true also that HDFS can >>> be improved for better random reading performance in ways not necessarily >>> yet committed to trunk or especially a 0.20.x branch with append support for >>> HBase. See https://issues.apache.org/jira/browse/HDFS-1323) >>> HBase holds indexes to store files in HDFS in memory. We also open all store >>> files at the HDFS layer and stash those references. Additionally, users can >>> specify the use of bloom filters to improve query time performance through +
Mohit Anchlia 2011-07-07, 21:01
-
RE: Hbase performance with HDFSButtler, David 2011-07-07, 21:43
The nice part of using HDFS as the file system is that the replication is taken care of by the file system. So, when the compaction finishes, that means the replication has already taken place.
-----Original Message----- From: Mohit Anchlia [mailto:[EMAIL PROTECTED]] Sent: Thursday, July 07, 2011 2:02 PM To: [EMAIL PROTECTED]; Andrew Purtell Subject: Re: Hbase performance with HDFS Thanks Andrew. Really helpful. I think I have one more question right now :) Underneath HDFS replicates blocks by default 3. Not sure how it relates to HFile and compactions. When compaction occurs is it also happening on the replica blocks from other nodes? If not then how does it work when one node fails. On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> You mentioned about compactions, when do those occur and what triggers >> them? > > Compactions are triggered by an algorithm that monitors the number of flush files in a store and the size of them, and is configurable in several dimensions. > >> Does it cause additional space usage when that happens > > Yes. > >> if it >> does it would mean you always need to have much more disk then you >> really need. > > > Not all regions are compacted at once. Each region by default is constrained to 256 MB. Not all regions will hold the full amount of data. The result is not a perfect copy (doubling) if some data has been deleted or are associated with TTLs that have expired. The merge sorted result is moved into place and the old files are deleted as soon as the compaction completes. So how much more is "much more"? You can't write to any kind of data store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... > >> Since HDFS is mostly write once how are updates/deletes handled? > > > Not mostly, only write once. > > From the BigTable paper, section 5.3: "A valid read operation is executed on a merged view of the sequence of SSTables and the memtable. Since the SSTables and the memtable are lexicographically sorted data structures, the merged view can be formed efficiently." So what this means is all the store files and the memstore serve effectively as change logs sorted in reverse chronological order. > > Deletes are just another write, but one that writes tombstones "covering" data with older timestamps. > > When serving queries, HBase searches store files back in time until it finds data at the coordinates requested or a tombstone. > > The process of compaction not only merge sorts a bunch of accumulated store files (from flushes) into fewer store files (or one) for read efficiency, it also performs housekeeping, dropping data "covered" by the delete tombstones. Incidentally this is also how TTLs are supported: expired values are dropped as well. > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >>________________________________ >>From: Mohit Anchlia <[EMAIL PROTECTED]> >>To: Andrew Purtell <[EMAIL PROTECTED]> >>Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >>Sent: Thursday, July 7, 2011 12:30 PM >>Subject: Re: Hbase performance with HDFS >> >>Thanks that helps! Just few more questions: >> >>You mentioned about compactions, when do those occur and what triggers >>them? Does it cause additional space usage when that happens, if it >>does it would mean you always need to have much more disk then you >>really need. >> >>Since HDFS is mostly write once how are updates/deletes handled? >> >>Is Hbase also suitable for Blobs? >> >>On Thu, Jul 7, 2011 at 12:11 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> Some thoughts off the top of my head. Lars' architecture material >>> might/should cover this too. Pretty sure his book will. >>> Regarding reads: >>> One does not have to read a whole HDFS block. You can request arbitrary byte >>> ranges with the block, via positioned reads. (It is true also that HDFS can >>> be improved for better random reading performance in ways not necessarily +
Buttler, David 2011-07-07, 21:43
-
Re: Hbase performance with HDFSMohit Anchlia 2011-07-07, 22:02
Thanks! I understand what you mean however I have little confusion.
Does it mean there are unused block sitting around? For eg: HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node B:b1,(b2),b3 and Node C:b1,b2,(b3). HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node B:b1,(b2),b3 and Node C:b1,b2,(b3) I have 2 questions: 1) When compactions occur on Node A would it also include b2 and b3 which is actually a redundant copy? My guess is yes. 2) Now compaction occurs and creates HFile3 which as you said is replicated. But what happens to HFile1 and HFile2? I am assuming it gets deleted. Thanks for everyones patience! On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <[EMAIL PROTECTED]> wrote: > The nice part of using HDFS as the file system is that the replication is taken care of by the file system. So, when the compaction finishes, that means the replication has already taken place. > > -----Original Message----- > From: Mohit Anchlia [mailto:[EMAIL PROTECTED]] > Sent: Thursday, July 07, 2011 2:02 PM > To: [EMAIL PROTECTED]; Andrew Purtell > Subject: Re: Hbase performance with HDFS > > Thanks Andrew. Really helpful. I think I have one more question right > now :) Underneath HDFS replicates blocks by default 3. Not sure how it > relates to HFile and compactions. When compaction occurs is it also > happening on the replica blocks from other nodes? If not then how does > it work when one node fails. > > On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >>> You mentioned about compactions, when do those occur and what triggers >>> them? >> >> Compactions are triggered by an algorithm that monitors the number of flush files in a store and the size of them, and is configurable in several dimensions. >> >>> Does it cause additional space usage when that happens >> >> Yes. >> >>> if it >>> does it would mean you always need to have much more disk then you >>> really need. >> >> >> Not all regions are compacted at once. Each region by default is constrained to 256 MB. Not all regions will hold the full amount of data. The result is not a perfect copy (doubling) if some data has been deleted or are associated with TTLs that have expired. The merge sorted result is moved into place and the old files are deleted as soon as the compaction completes. So how much more is "much more"? You can't write to any kind of data store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... >> >>> Since HDFS is mostly write once how are updates/deletes handled? >> >> >> Not mostly, only write once. >> >> From the BigTable paper, section 5.3: "A valid read operation is executed on a merged view of the sequence of SSTables and the memtable. Since the SSTables and the memtable are lexicographically sorted data structures, the merged view can be formed efficiently." So what this means is all the store files and the memstore serve effectively as change logs sorted in reverse chronological order. >> >> Deletes are just another write, but one that writes tombstones "covering" data with older timestamps. >> >> When serving queries, HBase searches store files back in time until it finds data at the coordinates requested or a tombstone. >> >> The process of compaction not only merge sorts a bunch of accumulated store files (from flushes) into fewer store files (or one) for read efficiency, it also performs housekeeping, dropping data "covered" by the delete tombstones. Incidentally this is also how TTLs are supported: expired values are dropped as well. >> >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >> >> >>>________________________________ >>>From: Mohit Anchlia <[EMAIL PROTECTED]> >>>To: Andrew Purtell <[EMAIL PROTECTED]> >>>Cc: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >>>Sent: Thursday, July 7, 2011 12:30 PM >>>Subject: Re: Hbase performance with HDFS >>> >>>Thanks that helps! Just few more questions: +
Mohit Anchlia 2011-07-07, 22:02
-
Re: Hbase performance with HDFSAndrew Purtell 2011-07-07, 22:12
> 1) When compactions occur on Node A would it also include b2 and b3
> which is actually a redundant copy? My guess is yes. I don't follow your question. HDFS files are read by opening an input stream. This stream is fed data from block replicas chosen at random. One block replica for each block. The reader doesn't see "redundant copies". > 2) Now compaction occurs and creates HFile3 which as you said is > replicated. But what happens to HFile1 and HFile2? I am assuming it > gets deleted. They are deleted. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Mohit Anchlia <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: > Sent: Thursday, July 7, 2011 3:02 PM > Subject: Re: Hbase performance with HDFS > >T hanks! I understand what you mean however I have little confusion. > Does it mean there are unused block sitting around? For eg: > > HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node > B:b1,(b2),b3 and Node C:b1,b2,(b3). > > HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node > B:b1,(b2),b3 and Node C:b1,b2,(b3) > > I have 2 questions: > > 1) When compactions occur on Node A would it also include b2 and b3 > which is actually a redundant copy? My guess is yes. > 2) Now compaction occurs and creates HFile3 which as you said is > replicated. But what happens to HFile1 and HFile2? I am assuming it > gets deleted. > > Thanks for everyones patience! > > On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <[EMAIL PROTECTED]> wrote: >> The nice part of using HDFS as the file system is that the replication is > taken care of by the file system. So, when the compaction finishes, that means > the replication has already taken place. >> >> -----Original Message----- >> From: Mohit Anchlia [mailto:[EMAIL PROTECTED]] >> Sent: Thursday, July 07, 2011 2:02 PM >> To: [EMAIL PROTECTED]; Andrew Purtell >> Subject: Re: Hbase performance with HDFS >> >> Thanks Andrew. Really helpful. I think I have one more question right >> now :) Underneath HDFS replicates blocks by default 3. Not sure how it >> relates to HFile and compactions. When compaction occurs is it also >> happening on the replica blocks from other nodes? If not then how does >> it work when one node fails. >> >> On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: >>>> You mentioned about compactions, when do those occur and what > triggers >>>> them? >>> >>> Compactions are triggered by an algorithm that monitors the number of > flush files in a store and the size of them, and is configurable in several > dimensions. >>> >>>> Does it cause additional space usage when that happens >>> >>> Yes. >>> >>>> if it >>>> does it would mean you always need to have much more disk then you >>>> really need. >>> >>> >>> Not all regions are compacted at once. Each region by default is > constrained to 256 MB. Not all regions will hold the full amount of data. The > result is not a perfect copy (doubling) if some data has been deleted or are > associated with TTLs that have expired. The merge sorted result is moved into > place and the old files are deleted as soon as the compaction completes. So how > much more is "much more"? You can't write to any kind of data > store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... >>> >>>> Since HDFS is mostly write once how are updates/deletes handled? >>> >>> >>> Not mostly, only write once. >>> >>> From the BigTable paper, section 5.3: "A valid read operation is > executed on a merged view of the sequence of SSTables and the memtable. Since > the SSTables and the memtable are lexicographically sorted data structures, the > merged view can be formed efficiently." So what this means is all the store > files and the memstore serve effectively as change logs sorted in reverse > chronological order. +
Andrew Purtell 2011-07-07, 22:12
-
Re: Hbase performance with HDFSMohit Anchlia 2011-07-07, 22:17
Got it. I was thinking compaction happens local to the node. But it
makes sense from what you have explained. On Thu, Jul 7, 2011 at 3:12 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> 1) When compactions occur on Node A would it also include b2 and b3 >> which is actually a redundant copy? My guess is yes. > > > I don't follow your question. > > HDFS files are read by opening an input stream. This stream is fed data from block replicas chosen at random. One block replica for each block. The reader doesn't see "redundant copies". > >> 2) Now compaction occurs and creates HFile3 which as you said is >> replicated. But what happens to HFile1 and HFile2? I am assuming it >> gets deleted. > > > They are deleted. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > > ----- Original Message ----- >> From: Mohit Anchlia <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: >> Sent: Thursday, July 7, 2011 3:02 PM >> Subject: Re: Hbase performance with HDFS >> >>T hanks! I understand what you mean however I have little confusion. >> Does it mean there are unused block sitting around? For eg: >> >> HFile1 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node >> B:b1,(b2),b3 and Node C:b1,b2,(b3). >> >> HFile2 with 3 blocks spread accross 3 nodes Node A:(b1),b2,b3 Node >> B:b1,(b2),b3 and Node C:b1,b2,(b3) >> >> I have 2 questions: >> >> 1) When compactions occur on Node A would it also include b2 and b3 >> which is actually a redundant copy? My guess is yes. >> 2) Now compaction occurs and creates HFile3 which as you said is >> replicated. But what happens to HFile1 and HFile2? I am assuming it >> gets deleted. >> >> Thanks for everyones patience! >> >> On Thu, Jul 7, 2011 at 2:43 PM, Buttler, David <[EMAIL PROTECTED]> wrote: >>> The nice part of using HDFS as the file system is that the replication is >> taken care of by the file system. So, when the compaction finishes, that means >> the replication has already taken place. >>> >>> -----Original Message----- >>> From: Mohit Anchlia [mailto:[EMAIL PROTECTED]] >>> Sent: Thursday, July 07, 2011 2:02 PM >>> To: [EMAIL PROTECTED]; Andrew Purtell >>> Subject: Re: Hbase performance with HDFS >>> >>> Thanks Andrew. Really helpful. I think I have one more question right >>> now :) Underneath HDFS replicates blocks by default 3. Not sure how it >>> relates to HFile and compactions. When compaction occurs is it also >>> happening on the replica blocks from other nodes? If not then how does >>> it work when one node fails. >>> >>> On Thu, Jul 7, 2011 at 1:53 PM, Andrew Purtell <[EMAIL PROTECTED]> >> wrote: >>>>> You mentioned about compactions, when do those occur and what >> triggers >>>>> them? >>>> >>>> Compactions are triggered by an algorithm that monitors the number of >> flush files in a store and the size of them, and is configurable in several >> dimensions. >>>> >>>>> Does it cause additional space usage when that happens >>>> >>>> Yes. >>>> >>>>> if it >>>>> does it would mean you always need to have much more disk then you >>>>> really need. >>>> >>>> >>>> Not all regions are compacted at once. Each region by default is >> constrained to 256 MB. Not all regions will hold the full amount of data. The >> result is not a perfect copy (doubling) if some data has been deleted or are >> associated with TTLs that have expired. The merge sorted result is moved into >> place and the old files are deleted as soon as the compaction completes. So how >> much more is "much more"? You can't write to any kind of data >> store on a (nearly) full volume anyway, no matter HBase/HDFS, or MySQL, or... >>>> >>>>> Since HDFS is mostly write once how are updates/deletes handled? >>>> >>>> >>>> Not mostly, only write once. >>>> >>>> From the BigTable paper, section 5.3: "A valid read operation is >> executed on a merged view of the sequence of SSTables and the memtable. Since +
Mohit Anchlia 2011-07-07, 22:17
-
Re: Hbase performance with HDFSArvind Jayaprakash 2011-07-11, 13:34
On Jul 07, Andrew Purtell wrote:
>> Since HDFS is mostly write once how are updates/deletes handled? > >Not mostly, only write once. > >Deletes are just another write, but one that writes tombstones >"covering" data with older timestamps. > >When serving queries, HBase searches store files back in time until it >finds data at the coordinates requested or a tombstone. > >The process of compaction not only merge sorts a bunch of accumulated >store files (from flushes) into fewer store files (or one) for read >efficiency, it also performs housekeeping, dropping data "covered" by >the delete tombstones. Incidentally this is also how TTLs are >supported: expired values are dropped as well. Just wanted to talk about WAL. My understanding is that updates are journalled onto HDFS by sequentially recording them as they happen per region. This is where the need for HDFS append comes in, something that I don't recollect seeing in the GFS paper. Despite having support for append in HDFS, it is still expensive to update it on every byte and here is where the wal flushing policies come in. +
Arvind Jayaprakash 2011-07-11, 13:34
-
Re: Hbase performance with HDFSAndrew Purtell 2011-07-11, 16:20
> Despite having support for append in HDFS, it is still expensive to
> update it on every byte and here is where the wal flushing policies come > in. Right, but a minor correction here. HBase doesn't flush the WAL per byte. We do a "group commit" of all changes to a row, to the extent the user has grouped changes to the row into a Put. So at the least this is first a write of all the bytes of an edit, or it could be more than one edit if we can group them, and _then_ a sync. Also most who run HBase run a HDFS patched with HDFS-895, so multiple syncs can be in flight. This does not reduce the added latency of a sync for the current writer but it does significantly reduce the expense of the sync with respect to other parallel writers. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) ----- Original Message ----- > From: Arvind Jayaprakash <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED]; Andrew Purtell <[EMAIL PROTECTED]> > Cc: > Sent: Monday, July 11, 2011 6:34 AM > Subject: Re: Hbase performance with HDFS > > On Jul 07, Andrew Purtell wrote: >>> Since HDFS is mostly write once how are updates/deletes handled? >> >> Not mostly, only write once. >> >> Deletes are just another write, but one that writes tombstones >> "covering" data with older timestamps. >> >> When serving queries, HBase searches store files back in time until it >> finds data at the coordinates requested or a tombstone. >> >> The process of compaction not only merge sorts a bunch of accumulated >> store files (from flushes) into fewer store files (or one) for read >> efficiency, it also performs housekeeping, dropping data "covered" > by >> the delete tombstones. Incidentally this is also how TTLs are >> supported: expired values are dropped as well. > > Just wanted to talk about WAL. My understanding is that updates are > journalled onto HDFS by sequentially recording them as they happen per > region. This is where the need for HDFS append comes in, something that > I don't recollect seeing in the GFS paper. > > Despite having support for append in HDFS, it is still expensive to > update it on every byte and here is where the wal flushing policies come > in. > +
Andrew Purtell 2011-07-11, 16:20
-
Re: Hbase performance with HDFSTed Dunning 2011-07-11, 16:47
Also, on MapR, you get another level of group commit above the row level.
That takes the writes even further from the byte by byte level. On Mon, Jul 11, 2011 at 9:20 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > Despite having support for append in HDFS, it is still expensive to > > update it on every byte and here is where the wal flushing policies come > > in. > > Right, but a minor correction here. HBase doesn't flush the WAL per byte. > We do a "group commit" of all changes to a row, to the extent the user has > grouped changes to the row into a Put. So at the least this is first a write > of all the bytes of an edit, or it could be more than one edit if we can > group them, and _then_ a sync. > > > Also most who run HBase run a HDFS patched with HDFS-895, so multiple syncs > can be in flight. This does not reduce the added latency of a sync for the > current writer but it does significantly reduce the expense of the sync with > respect to other parallel writers. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > > ----- Original Message ----- > > From: Arvind Jayaprakash <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED]; Andrew Purtell <[EMAIL PROTECTED]> > > Cc: > > Sent: Monday, July 11, 2011 6:34 AM > > Subject: Re: Hbase performance with HDFS > > > > On Jul 07, Andrew Purtell wrote: > >>> Since HDFS is mostly write once how are updates/deletes handled? > >> > >> Not mostly, only write once. > >> > >> Deletes are just another write, but one that writes tombstones > >> "covering" data with older timestamps. > >> > >> When serving queries, HBase searches store files back in time until it > >> finds data at the coordinates requested or a tombstone. > >> > >> The process of compaction not only merge sorts a bunch of accumulated > >> store files (from flushes) into fewer store files (or one) for read > >> efficiency, it also performs housekeeping, dropping data "covered" > > by > >> the delete tombstones. Incidentally this is also how TTLs are > >> supported: expired values are dropped as well. > > > > Just wanted to talk about WAL. My understanding is that updates are > > journalled onto HDFS by sequentially recording them as they happen per > > region. This is where the need for HDFS append comes in, something that > > I don't recollect seeing in the GFS paper. > > > > Despite having support for append in HDFS, it is still expensive to > > update it on every byte and here is where the wal flushing policies come > > in. > > > +
Ted Dunning 2011-07-11, 16:47
-
Re: Hbase performance with HDFSJoey Echeverria 2011-07-11, 18:22
On Mon, Jul 11, 2011 at 12:47 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> Also, on MapR, you get another level of group commit above the row level. > That takes the writes even further from the byte by byte level. Is this done with an HBASE patch? I don't see how this could be done merely at the FS layer. -Joey -- Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-07-11, 18:22
-
Re: Hbase performance with HDFSTed Dunning 2011-07-11, 18:57
On Mon, Jul 11, 2011 at 11:22 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> On Mon, Jul 11, 2011 at 12:47 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > Also, on MapR, you get another level of group commit above the row level. > > That takes the writes even further from the byte by byte level. > > Is this done with an HBASE patch? I don't see how this could be done > merely at the FS layer. > :-) No changes were required in HBase to enable this. +
Ted Dunning 2011-07-11, 18:57
-
Re: Hbase performance with HDFSJoey Echeverria 2011-07-11, 19:37
> :-)
> > No changes were required in HBase to enable this. Do the semantics of sync change? Do you pause one or more outstanding syncs, sync a group of data (4KB maybe) and then return from all of those outstanding syncs simultaneously? -Joey -- Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-07-11, 19:37
-
Re: Hbase performance with HDFSM. C. Srivas 2011-07-11, 19:48
On Mon, Jul 11, 2011 at 12:37 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> > :-) > > > > No changes were required in HBase to enable this. > > Do the semantics of sync change? Do you pause one or more outstanding > syncs, sync a group of data (4KB maybe) and then return from all of > those outstanding syncs simultaneously? > HDFS "sync" is merely a subset of full-fledged NFS support (which includes fully random-write, NFS commit and NFS fsync). So it was trivially easy to support hbase's requirements in MapR. > > -Joey > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > +
M. C. Srivas 2011-07-11, 19:48
-
Re: Hbase performance with HDFSLuke Lu 2011-07-11, 20:33
On Mon, Jul 11, 2011 at 12:37 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote:
> Do the semantics of sync change? Do you pause one or more outstanding > syncs, sync a group of data (4KB maybe) and then return from all of > those outstanding syncs simultaneously? Group commit is a standard storage technique to trade a little latency for throughput. Yes, it can return sync responses at the same time (or async) when all the outstanding sync is synced to a WAL. A good group commit implementation would adaptively change the latency window depends on the current throughput. __Luke +
Luke Lu 2011-07-11, 20:33
-
Re: Hbase performance with HDFSTed Dunning 2011-07-11, 20:47
No, the semantics do not change.
On Mon, Jul 11, 2011 at 12:37 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > > :-) > > > > No changes were required in HBase to enable this. > > Do the semantics of sync change? Do you pause one or more outstanding > syncs, sync a group of data (4KB maybe) and then return from all of > those outstanding syncs simultaneously? > +
Ted Dunning 2011-07-11, 20:47
-
Re: Hbase performance with HDFSStack 2011-07-12, 03:45
Ted, you seem to be describing voodoo? Are you talking of a group
commit of the group commits? Bigger batches at the layer below dfsclient? St.Ack On Mon, Jul 11, 2011 at 11:57 AM, Ted Dunning <[EMAIL PROTECTED]> wrote: > On Mon, Jul 11, 2011 at 11:22 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > >> On Mon, Jul 11, 2011 at 12:47 PM, Ted Dunning <[EMAIL PROTECTED]> >> wrote: >> > Also, on MapR, you get another level of group commit above the row level. >> > That takes the writes even further from the byte by byte level. >> >> Is this done with an HBASE patch? I don't see how this could be done >> merely at the FS layer. >> > > :-) > > No changes were required in HBase to enable this. > +
Stack 2011-07-12, 03:45
-
Re: Hbase performance with HDFSTed Dunning 2011-07-12, 04:54
Hardly voodoo, but also not something that can be done casually. You need
strong transactional guarantees from the file system layer to do this. And yes, it does come down to something like groups of group commits. It didn't require patching the layer below dfsclient so much as correct and careful design of the layer below that layer. I should repeat that this only happens on MapR; we didn't touch the HDFS code. I would expect that getting this to work correctly at that layer could be extremely difficult because you would have a huge proof of correctness task because the lower layers of HDFS are not well specified in terms of temporal semantics. On Mon, Jul 11, 2011 at 8:45 PM, Stack <[EMAIL PROTECTED]> wrote: > Ted, you seem to be describing voodoo? Are you talking of a group > commit of the group commits? Bigger batches at the layer below > dfsclient? > St.Ack > > On Mon, Jul 11, 2011 at 11:57 AM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > On Mon, Jul 11, 2011 at 11:22 AM, Joey Echeverria <[EMAIL PROTECTED]> > wrote: > > > >> On Mon, Jul 11, 2011 at 12:47 PM, Ted Dunning <[EMAIL PROTECTED]> > >> wrote: > >> > Also, on MapR, you get another level of group commit above the row > level. > >> > That takes the writes even further from the byte by byte level. > >> > >> Is this done with an HBASE patch? I don't see how this could be done > >> merely at the FS layer. > >> > > > > :-) > > > > No changes were required in HBase to enable this. > > > +
Ted Dunning 2011-07-12, 04:54
-
Re: Hbase performance with HDFSDoug Meil 2011-07-07, 19:13
Hi there- There is a FAQ entry in the Hbase book on this exact question. http://hbase.apache.org/book.html#faq.hdfs.hbase On 7/7/11 2:53 PM, "Mohit Anchlia" <[EMAIL PROTECTED]> wrote: >I have looked at bigtable and it's ssTables etc. But my question is >directly related to how it's used with HDFS. HDFS recommends large >files, bigger blocks, write once and read many sequential reads. But >accessing small rows and writing small rows is more random and >different than inherent design of HDFS. How do these 2 go together and >is able to provide performance. > >On Thu, Jul 7, 2011 at 11:22 AM, Andrew Purtell <[EMAIL PROTECTED]> >wrote: >> Hi Mohit, >> >> Start here: http://labs.google.com/papers/bigtable.html >> >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet >>Hein (via Tom White) >> >> >>>________________________________ >>>From: Mohit Anchlia <[EMAIL PROTECTED]> >>>To: [EMAIL PROTECTED] >>>Sent: Thursday, July 7, 2011 11:12 AM >>>Subject: Hbase performance with HDFS >>> >>>I've been trying to understand how Hbase can provide good performance >>>using HDFS when purpose of HDFS is sequential large block sizes which >>>is inherently different than of Hbase where it's more random and row >>>sizes might be very small. >>> >>>I am reading this but doesn't answer my question. It does say that >>>HFile block size is different but how it really works with HDFS is >>>what I am trying to understand. >>> >>>http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html >>> >>> >>> +
Doug Meil 2011-07-07, 19:13
|