|
Kevin Apte
2010-05-09, 12:08
Andrew Purtell
2010-05-09, 16:44
Amandeep Khurana
2010-05-09, 20:56
Andrew Purtell
2010-05-10, 03:53
Jeff Hammerbacher
2010-05-11, 19:51
Edward Capriolo
2010-05-11, 20:28
Jeff Hammerbacher
2010-05-11, 21:03
Jeff Hammerbacher
2010-05-11, 21:40
Edward Capriolo
2010-05-11, 22:14
Jeff Hammerbacher
2010-05-11, 22:29
Buttler, David
2010-05-11, 23:40
Kevin Apte
2010-05-12, 04:12
Edward Capriolo
2010-05-12, 13:38
Andrew Purtell
2010-05-12, 17:30
Edward Capriolo
2010-05-12, 18:15
Jeff Hammerbacher
2010-05-13, 04:26
Edward Capriolo
2010-05-13, 15:09
Ryan Rawson
2010-05-13, 19:46
Gibbon, Robert, VF-Group
2010-05-13, 20:22
Ryan Rawson
2010-05-13, 20:53
Andrew Purtell
2010-05-13, 21:54
Gibbon, Robert, VF-Group
2010-05-14, 19:02
Todd Lipcon
2010-05-14, 20:01
Gibbon, Robert, VF-Group
2010-05-14, 21:15
Todd Lipcon
2010-05-15, 01:51
Gibbon, Robert, VF-Group
2010-05-15, 20:19
baleksan@...
2010-05-15, 20:43
Andrew Purtell
2010-05-15, 21:30
Todd Lipcon
2010-05-15, 22:27
Gibbon, Robert, VF-Group
2010-05-16, 08:22
|
-
Using HBase on other file systemsKevin Apte 2010-05-09, 12:08
I am wondering if anyone has thought about using HBase on other file systems
like "Gluster". I think Gluster may offer much faster performance without exorbitant cost. With Gluster, you would have to fetch the data from the "Storage Bricks" and process it in your own environment. This allows the servers that are used as storage nodes very cheap. I think with Gluster you can fetch data from any of the nodes. I would imagine this would be a little slower than Server Attached Storage, I think have a 10 Gbps network, and enough network bandwidth may make this a non-issue. Any comments? No, I do not work for Gluster. I have just started researching this, so I have not fact checked it adequately. Kevin
-
Re: Using HBase on other file systemsAndrew Purtell 2010-05-09, 16:44
Our experience with Gluster 2 is that self heal when a brick drops off the network is very painful. The high performance impact lasts for a long time. I'm not sure but I think Gluster 3 may only rereplicate missing sections instead of entire files. On the other hand I would not trust Gluster 3 to be stable (yet).
I've also tried KFS. My experience seem to bear out other observations that it is ~30% slower that HDFS. Also I was unable to keep the chunkservers up on my CentOS 5 based 64 bit systems. I give Sriram shell access so he could poke around coredumps with gdb but there was no satisfactory resolution. Another team at Trend is looking at Ceph. I think it is a highly promising filesystem but at the moment it is an experimental filesystem undergoing a high rate of development that requires another experimental filesystem undergoing a high rate of development (btrfs) for recovery semantics, and the web site warns "NOT SAFE YET" or similar. I doubt it has ever been tested on clusters > 100 nodes. In contrast, HDFS has been running in production on clusters with 1000s of nodes for a long time. There currently is not a credible competitor to HDFS in my opinion. Ceph is definitely worth keeping an eye on however. I wonder if HDFS will evolve to offer a similar scalable metadata service (NameNode) to compete. Certainly that would improve its scalability and availability story, both issues today presenting barriers to adoption, and barriers for anything layered on top, like HBase. - Andy > From: Kevin Apte > Subject: Using HBase on other file systems > To: [EMAIL PROTECTED] > Date: Sunday, May 9, 2010, 5:08 AM > > I am wondering if anyone has thought > about using HBase on other file systems like "Gluster". I > think Gluster may offer much faster performance without > exorbitant cost. With Gluster, you would have to > fetch the data from the "Storage Bricks" and process it in > your own environment. This allows the > servers that are used as storage nodes very cheap.
-
Re: Using HBase on other file systemsAmandeep Khurana 2010-05-09, 20:56
I have HBase running over Ceph on a small cluster here at UC Santa Cruz and
am evaluating its performance as compared to HDFS. You'll see some numbers soon. Theoretically, HBase can work on any filesystem. It should either have a posix client that you can mount and HBase can use it as a raw filesystem (file:///mount/filesystem) or you'll need to extend the FileSystem class to write a client that Hadoop Core can use. -Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Sun, May 9, 2010 at 9:44 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Our experience with Gluster 2 is that self heal when a brick drops off the > network is very painful. The high performance impact lasts for a long time. > I'm not sure but I think Gluster 3 may only rereplicate missing sections > instead of entire files. On the other hand I would not trust Gluster 3 to be > stable (yet). > > I've also tried KFS. My experience seem to bear out other observations that > it is ~30% slower that HDFS. Also I was unable to keep the chunkservers up > on my CentOS 5 based 64 bit systems. I give Sriram shell access so he could > poke around coredumps with gdb but there was no satisfactory resolution. > > Another team at Trend is looking at Ceph. I think it is a highly promising > filesystem but at the moment it is an experimental filesystem undergoing a > high rate of development that requires another experimental filesystem > undergoing a high rate of development (btrfs) for recovery semantics, and > the web site warns "NOT SAFE YET" or similar. I doubt it has ever been > tested on clusters > 100 nodes. In contrast, HDFS has been running in > production on clusters with 1000s of nodes for a long time. > > There currently is not a credible competitor to HDFS in my opinion. Ceph is > definitely worth keeping an eye on however. I wonder if HDFS will evolve to > offer a similar scalable metadata service (NameNode) to compete. Certainly > that would improve its scalability and availability story, both issues today > presenting barriers to adoption, and barriers for anything layered on top, > like HBase. > > - Andy > > > > From: Kevin Apte > > Subject: Using HBase on other file systems > > To: [EMAIL PROTECTED] > > Date: Sunday, May 9, 2010, 5:08 AM > > > > I am wondering if anyone has thought > > about using HBase on other file systems like "Gluster". I > > think Gluster may offer much faster performance without > > exorbitant cost. With Gluster, you would have to > > fetch the data from the "Storage Bricks" and process it in > > your own environment. This allows the > > servers that are used as storage nodes very cheap. > > > > > >
-
Re: Using HBase on other file systemsAndrew Purtell 2010-05-10, 03:53
> or you'll need to
> extend the FileSystem class to write a client that Hadoop > Core can use. There is one: https://issues.apache.org/jira/browse/HADOOP-6253 It even exports stripe locations in a way useful for distributing MR task placement, but provides only one host per "block". - Andy > From: Amandeep Khurana <[EMAIL PROTECTED]> > I have HBase running over Ceph on a small cluster here at UC > Santa Cruz and am evaluating its performance as compared to > HDFS. You'll see some numbers soon. Theoretically, HBase can > work on any filesystem. It should either have > a posix client that you can mount and HBase can use it as a > raw filesystem (file:///mount/filesystem) or you'll need to > extend the FileSystem class to write a client that Hadoop > Core can use.
-
Re: Using HBase on other file systemsJeff Hammerbacher 2010-05-11, 19:51
Hey,
Thanks for the evaluation, Andrew. Ceph certainly is elegant in design; HDFS, similar to GFS [1], was purpose-built to get into production quickly, so its current incarnation lacks some of the same elegance. On the other hand, there are many techniques for making the metadata servers scalable and highly available. HDFS has the advantage of already storing hundreds of petabytes across thousands of organizations, so we're able to guide those design decisions with empirical data from heavily used clusters. We'd love to have heavy users of HBase contribute to the discussions of scalability [2] and availability [3] of HDFS. See also the excellent article from Konstantin Schvako of Yahoo! on HDFS scalability [4]. I've also conducted extensive reviews at both Facebook and now at Cloudera of alternative file systems, but at this stage, I concur with Andrew: HDFS is the only reasonable open source choice for production data processing workloads. I'm also optimistic that the scalability and availability challenges will be addressed by the (very active and diverse) HDFS developer community over the next few years, and we'll benefit from the work that's already been put into the robustness and manageability of the system. Regardless, every technology improves more rapidly when there's strong competition, so it will be good to see one of these other file systems emerge as a viable alternative to HDFS for HBase storage some day. [1] http://cacm.acm.org/magazines/2010/3/76283-gfs-evolution-on-fast-forward/fulltext [2] https://issues.apache.org/jira/browse/HDFS-1051 [3] https://issues.apache.org/jira/browse/HDFS-1064 [4] http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html Later, Jeff On Sun, May 9, 2010 at 9:44 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Our experience with Gluster 2 is that self heal when a brick drops off the > network is very painful. The high performance impact lasts for a long time. > I'm not sure but I think Gluster 3 may only rereplicate missing sections > instead of entire files. On the other hand I would not trust Gluster 3 to be > stable (yet). > > I've also tried KFS. My experience seem to bear out other observations that > it is ~30% slower that HDFS. Also I was unable to keep the chunkservers up > on my CentOS 5 based 64 bit systems. I give Sriram shell access so he could > poke around coredumps with gdb but there was no satisfactory resolution. > > Another team at Trend is looking at Ceph. I think it is a highly promising > filesystem but at the moment it is an experimental filesystem undergoing a > high rate of development that requires another experimental filesystem > undergoing a high rate of development (btrfs) for recovery semantics, and > the web site warns "NOT SAFE YET" or similar. I doubt it has ever been > tested on clusters > 100 nodes. In contrast, HDFS has been running in > production on clusters with 1000s of nodes for a long time. > > There currently is not a credible competitor to HDFS in my opinion. Ceph is > definitely worth keeping an eye on however. I wonder if HDFS will evolve to > offer a similar scalable metadata service (NameNode) to compete. Certainly > that would improve its scalability and availability story, both issues today > presenting barriers to adoption, and barriers for anything layered on top, > like HBase. > > - Andy > > > > From: Kevin Apte > > Subject: Using HBase on other file systems > > To: [EMAIL PROTECTED] > > Date: Sunday, May 9, 2010, 5:08 AM > > > > I am wondering if anyone has thought > > about using HBase on other file systems like "Gluster". I > > think Gluster may offer much faster performance without > > exorbitant cost. With Gluster, you would have to > > fetch the data from the "Storage Bricks" and process it in > > your own environment. This allows the > > servers that are used as storage nodes very cheap. > > > > > >
-
Re: Using HBase on other file systemsEdward Capriolo 2010-05-11, 20:28
On Tue, May 11, 2010 at 3:51 PM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote:
> Hey, > > Thanks for the evaluation, Andrew. Ceph certainly is elegant in design; > HDFS, similar to GFS [1], was purpose-built to get into production quickly, > so its current incarnation lacks some of the same elegance. On the other > hand, there are many techniques for making the metadata servers scalable > and > highly available. HDFS has the advantage of already storing hundreds of > petabytes across thousands of organizations, so we're able to guide those > design decisions with empirical data from heavily used clusters. We'd love > to have heavy users of HBase contribute to the discussions of scalability > [2] and availability [3] of HDFS. See also the excellent article from > Konstantin Schvako of Yahoo! on HDFS scalability [4]. > > I've also conducted extensive reviews at both Facebook and now at Cloudera > of alternative file systems, but at this stage, I concur with Andrew: HDFS > is the only reasonable open source choice for production data processing > workloads. I'm also optimistic that the scalability and availability > challenges will be addressed by the (very active and diverse) HDFS > developer > community over the next few years, and we'll benefit from the work that's > already been put into the robustness and manageability of the system. > > Regardless, every technology improves more rapidly when there's strong > competition, so it will be good to see one of these other file systems > emerge as a viable alternative to HDFS for HBase storage some day. > > [1] > > http://cacm.acm.org/magazines/2010/3/76283-gfs-evolution-on-fast-forward/fulltext > [2] https://issues.apache.org/jira/browse/HDFS-1051 > [3] https://issues.apache.org/jira/browse/HDFS-1064 > [4] > > http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html > > Later, > Jeff > > On Sun, May 9, 2010 at 9:44 AM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > Our experience with Gluster 2 is that self heal when a brick drops off > the > > network is very painful. The high performance impact lasts for a long > time. > > I'm not sure but I think Gluster 3 may only rereplicate missing sections > > instead of entire files. On the other hand I would not trust Gluster 3 to > be > > stable (yet). > > > > I've also tried KFS. My experience seem to bear out other observations > that > > it is ~30% slower that HDFS. Also I was unable to keep the chunkservers > up > > on my CentOS 5 based 64 bit systems. I give Sriram shell access so he > could > > poke around coredumps with gdb but there was no satisfactory resolution. > > > > Another team at Trend is looking at Ceph. I think it is a highly > promising > > filesystem but at the moment it is an experimental filesystem undergoing > a > > high rate of development that requires another experimental filesystem > > undergoing a high rate of development (btrfs) for recovery semantics, and > > the web site warns "NOT SAFE YET" or similar. I doubt it has ever been > > tested on clusters > 100 nodes. In contrast, HDFS has been running in > > production on clusters with 1000s of nodes for a long time. > > > > There currently is not a credible competitor to HDFS in my opinion. Ceph > is > > definitely worth keeping an eye on however. I wonder if HDFS will evolve > to > > offer a similar scalable metadata service (NameNode) to compete. > Certainly > > that would improve its scalability and availability story, both issues > today > > presenting barriers to adoption, and barriers for anything layered on > top, > > like HBase. > > > > - Andy > > > > > > > From: Kevin Apte > > > Subject: Using HBase on other file systems > > > To: [EMAIL PROTECTED] > > > Date: Sunday, May 9, 2010, 5:08 AM > > > > > > I am wondering if anyone has thought > > > about using HBase on other file systems like "Gluster". I > > > think Gluster may offer much faster performance without > > > exorbitant cost. With Gluster, you would have to Hbase is the most square peg, round hole piece of software ever (not an insult read on). HDFS was designed for high throughput streaming batch processing. The random access support is not good. hbase gets around the HDFS shortcomings using caching, HFiles, compaction processes, etc to make the HDFS (tape drive) seem great at all these things it is not good at. One compelling reason to use HBase is that you are already using HDFS for other things. IMHO If you do not need HDFS, you do not really need HBASE. One of the other unamed distributed key value stores will get the job done.
-
Re: Using HBase on other file systemsJeff Hammerbacher 2010-05-11, 21:03
Hey Edward,
Database systems have been built for decades against a storage medium (spinning magnetic platters) which have the same characteristics you point out in HDFS. In the interim, they've managed to service a large number of low latency workloads in a reasonable fashion. There's a reason the capstone assignment in the first databases course at Wisconsin was for years an implementation of the PostgreSQL buffer pool--the caching logic for low latency random access is the hard part. Having participated in the design and implementation of one of these other data stores to which you refer, I agree that there are flaws in the BigTable design. On the other hand, Solaris and Mach look a lot better on paper than the Linux kernel. If you consider HBase to be a direct implementation of the BigTable design, then I would argue that system has unequivocally proven its utility at scale. Choosing a technology based on the problems it has solved rather than the elegance of the design helps minimize project risk, in my experience. Some day soon, as with databases, only a small subset of people well versed in systems design will be arguing over implementation strategies. The rest of the world will be using these technologies to solve problems and be worried more about the interfaces they provide. I'm excited for HBase to reach that stage. Thanks, Jeff On Tue, May 11, 2010 at 1:28 PM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > On Tue, May 11, 2010 at 3:51 PM, Jeff Hammerbacher <[EMAIL PROTECTED] > >wrote: > > > Hey, > > > > Thanks for the evaluation, Andrew. Ceph certainly is elegant in design; > > HDFS, similar to GFS [1], was purpose-built to get into production > quickly, > > so its current incarnation lacks some of the same elegance. On the other > > hand, there are many techniques for making the metadata servers scalable > > and > > highly available. HDFS has the advantage of already storing hundreds of > > petabytes across thousands of organizations, so we're able to guide those > > design decisions with empirical data from heavily used clusters. We'd > love > > to have heavy users of HBase contribute to the discussions of scalability > > [2] and availability [3] of HDFS. See also the excellent article from > > Konstantin Schvako of Yahoo! on HDFS scalability [4]. > > > > I've also conducted extensive reviews at both Facebook and now at > Cloudera > > of alternative file systems, but at this stage, I concur with Andrew: > HDFS > > is the only reasonable open source choice for production data processing > > workloads. I'm also optimistic that the scalability and availability > > challenges will be addressed by the (very active and diverse) HDFS > > developer > > community over the next few years, and we'll benefit from the work that's > > already been put into the robustness and manageability of the system. > > > > Regardless, every technology improves more rapidly when there's strong > > competition, so it will be good to see one of these other file systems > > emerge as a viable alternative to HDFS for HBase storage some day. > > > > [1] > > > > > http://cacm.acm.org/magazines/2010/3/76283-gfs-evolution-on-fast-forward/fulltext > > [2] https://issues.apache.org/jira/browse/HDFS-1051 > > [3] https://issues.apache.org/jira/browse/HDFS-1064 > > [4] > > > > > http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html > > > > Later, > > Jeff > > > > On Sun, May 9, 2010 at 9:44 AM, Andrew Purtell <[EMAIL PROTECTED]> > > wrote: > > > > > Our experience with Gluster 2 is that self heal when a brick drops off > > the > > > network is very painful. The high performance impact lasts for a long > > time. > > > I'm not sure but I think Gluster 3 may only rereplicate missing > sections > > > instead of entire files. On the other hand I would not trust Gluster 3 > to > > be > > > stable (yet). > > > > > > I've also tried KFS. My experience seem to bear out other observations > > that > > > it is ~30% slower that HDFS. Also I was unable to keep the chunkservers
-
Re: Using HBase on other file systemsJeff Hammerbacher 2010-05-11, 21:40
Okay, the assertion that HBase is only interesting if you need HDFS is
continuing to rankle for me. On the surface, it sounds reasonable, but it's just so wrong. The specifics cited (caching, HFile, and compaction) are actually all advantages of the HBase design. 1) Caching: any data store which targets multiple kinds of storage media with different latency characteristics will cache. Not interesting, and totally confusing to me how this could be cited as a disadvantage. 2) HFile: HFile is an on-disk layout of data to minimize seeks for random accesses while not hampering scans. Every system which stores data to magnetic drives must decide how to lay bits out on platters. HFile doesn't go that low, of course, but it's not an artifact of HBase using HDFS; see https://issues.apache.org/jira/browse/CASSANDRA-674 or http://blog.basho.com/2010/04/27/hello,-bitcask/, e.g. Avro defines an object file container format ( http://avro.apache.org/docs/current/spec.html#Object+Container+Files) for the same purpose. HFile squeezes a lot of performance out of Java and is a pretty reasonable implementation. Again, I'm totally confused why this is cited as a disadvantage. 3) Compactions: HBase, like many modern data stores, is really just a hierarchy of buffers; some in memory, some on disk. Because of the characteristics of magnetic storage, this log-structured merge tree strategy does a nice job of minimizing seeks on the write path while reducing disk fragmentation on the read path. There is a slight penalty on the read path, as data can live in any of the buffers, but if you've ever managed a long-lived MySQL database, you'll be glad to amortize your pain across each read rather than paying the huge penalty of having a highly fragmented database send the disk head all across the disk during a scan. It is true that you could dedicate a single disk to the WAL rather than putting it on a DFS, and that may result in better performance; on the other hand, you increase system complexity, as you now have to implement replication and consistency guarantees for the WAL data if you want to survive machine failure. I certainly don't want this consternation to be perceived as ad hominem: I'm much more frustrated by the logic of the statement seeming reasonable on the surface, which is the level at which most people are able to evaluate systems, but being just completely wrong when examined in detail. There are just too many storage systems to choose from these days, and specious arguments for one or the other must be put to rest so users can make well informed decisions and not just latch on to the next shiny object that comes along. On Tue, May 11, 2010 at 2:03 PM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote: > Hey Edward, > > Database systems have been built for decades against a storage medium > (spinning magnetic platters) which have the same characteristics you point > out in HDFS. In the interim, they've managed to service a large number of > low latency workloads in a reasonable fashion. There's a reason the capstone > assignment in the first databases course at Wisconsin was for years an > implementation of the PostgreSQL buffer pool--the caching logic for low > latency random access is the hard part. > > Having participated in the design and implementation of one of these other > data stores to which you refer, I agree that there are flaws in the BigTable > design. On the other hand, Solaris and Mach look a lot better on paper than > the Linux kernel. If you consider HBase to be a direct implementation of the > BigTable design, then I would argue that system has unequivocally proven its > utility at scale. Choosing a technology based on the problems it has solved > rather than the elegance of the design helps minimize project risk, in my > experience. > > Some day soon, as with databases, only a small subset of people well versed > in systems design will be arguing over implementation strategies. The rest > of the world will be using these technologies to solve problems and be
-
Re: Using HBase on other file systemsEdward Capriolo 2010-05-11, 22:14
On Tue, May 11, 2010 at 5:40 PM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote:
> Okay, the assertion that HBase is only interesting if you need HDFS is > continuing to rankle for me. On the surface, it sounds reasonable, but it's > just so wrong. The specifics cited (caching, HFile, and compaction) are > actually all advantages of the HBase design. > > 1) Caching: any data store which targets multiple kinds of storage media > with different latency characteristics will cache. Not interesting, and > totally confusing to me how this could be cited as a disadvantage. > 2) HFile: HFile is an on-disk layout of data to minimize seeks for random > accesses while not hampering scans. Every system which stores data to > magnetic drives must decide how to lay bits out on platters. HFile doesn't > go that low, of course, but it's not an artifact of HBase using HDFS; see > https://issues.apache.org/jira/browse/CASSANDRA-674 or > http://blog.basho.com/2010/04/27/hello,-bitcask/, e.g. Avro defines an > object file container format ( > http://avro.apache.org/docs/current/spec.html#Object+Container+Files) for > the same purpose. HFile squeezes a lot of performance out of Java and is a > pretty reasonable implementation. Again, I'm totally confused why this is > cited as a disadvantage. > 3) Compactions: HBase, like many modern data stores, is really just a > hierarchy of buffers; some in memory, some on disk. Because of the > characteristics of magnetic storage, this log-structured merge tree > strategy > does a nice job of minimizing seeks on the write path while reducing disk > fragmentation on the read path. There is a slight penalty on the read path, > as data can live in any of the buffers, but if you've ever managed a > long-lived MySQL database, you'll be glad to amortize your pain across each > read rather than paying the huge penalty of having a highly fragmented > database send the disk head all across the disk during a scan. It is true > that you could dedicate a single disk to the WAL rather than putting it on > a > DFS, and that may result in better performance; on the other hand, you > increase system complexity, as you now have to implement replication and > consistency guarantees for the WAL data if you want to survive machine > failure. > > I certainly don't want this consternation to be perceived as ad hominem: > I'm > much more frustrated by the logic of the statement seeming reasonable on > the > surface, which is the level at which most people are able to evaluate > systems, but being just completely wrong when examined in detail. There are > just too many storage systems to choose from these days, and specious > arguments for one or the other must be put to rest so users can make well > informed decisions and not just latch on to the next shiny object that > comes > along. > > > On Tue, May 11, 2010 at 2:03 PM, Jeff Hammerbacher <[EMAIL PROTECTED] > >wrote: > > > Hey Edward, > > > > Database systems have been built for decades against a storage medium > > (spinning magnetic platters) which have the same characteristics you > point > > out in HDFS. In the interim, they've managed to service a large number of > > low latency workloads in a reasonable fashion. There's a reason the > capstone > > assignment in the first databases course at Wisconsin was for years an > > implementation of the PostgreSQL buffer pool--the caching logic for low > > latency random access is the hard part. > > > > Having participated in the design and implementation of one of these > other > > data stores to which you refer, I agree that there are flaws in the > BigTable > > design. On the other hand, Solaris and Mach look a lot better on paper > than > > the Linux kernel. If you consider HBase to be a direct implementation of > the > > BigTable design, then I would argue that system has unequivocally proven > its > > utility at scale. Choosing a technology based on the problems it has > solved > > rather than the elegance of the design helps minimize project risk, in my Jeff, elegance of the design helps minimize project risk, in my I agree with this. I am not trying to imply that HBase is risky or not proven at scale. I do think that if you compare GoogleFS to HDFS, GFS looks more full featured. HDFS seems to be very focused on what I consider a pure implementation, primarily designed for map reduce workloads. I do believe my logic is reasonable. HBase has a lot of code designed around HDFS. We know these tickets that get cited all the time, for better random reads, or for sync() support. HBase gets the benefits of HDFS and has to deal with its drawbacks. Other key value stores handle storage directly. Do not be rakled :) What I meant, more or less, HBase is always a solution for a key value store. It is an even better solution if you want the underlying data stored on HDFS to run map/reduce efficiently on the data. However since the topic started like "Can I run Hbase ontop of something besides HDFS?". The quick answers are: theoretically: yes practically: as in by tomorrow without knowledge of the code base: no want HBase.
-
Re: Using HBase on other file systemsJeff Hammerbacher 2010-05-11, 22:29
Hey Edward,
I do think that if you compare GoogleFS to HDFS, GFS looks more full > featured. > What features are you missing? Multi-writer append was explicitly called out by Sean Quinlan as a bad idea, and rolled back. From internal conversations with Google engineers, erasure coding of blocks suffered a similar fate. Native client access would certainly be nice, but FUSE gets you most of the way there. Scalability/availability of the NN, RPC QoS, alternative block placement strategies are second-order features which didn't exist in GFS until later in its lifecycle of development as well. HDFS is following a similar path and has JIRA tickets with active discussions. I'd love to hear your feature requests, and I'll be sure to translate them into JIRA tickets. I do believe my logic is reasonable. HBase has a lot of code designed around > HDFS. We know these tickets that get cited all the time, for better random > reads, or for sync() support. HBase gets the benefits of HDFS and has to > deal with its drawbacks. Other key value stores handle storage directly. > Sync() works and will be in the next release, and its absence was simply a result of the youth of the system. Now that that limitation has been removed, please point to another place in the code where using HDFS rather than the local file system is forcing HBase to make compromises. Your initial attempts on this front (caching, HFile, compactions) were, I hope, debunked by my previous email. It's also worth noting that Cassandra does all three, despite managing its own storage. I'm trying to learn from this exchange and always enjoy understanding new systems. Here's what I have so far from your arguments: 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly agree on the general point; I'm pressing you to name some specific disadvantages, in hopes of helping prioritize our development of HDFS. So far, you've named things which are either a) not actually disadvantages b) no longer true. If you can come up with the disadvantages, we'll certainly take them into account. I've certainly got a number of them on our roadmap. 2) If you don't want to use HDFS, you won't want to use HBase. Also certainly true, but I'm not sure there's not much to learn from this assertion. I'd once again ask: why would you not want to use HDFS, and what is your choice in its stead? Thanks, Jeff
-
RE: Using HBase on other file systemsButtler, David 2010-05-11, 23:40
If you are opening up the discussion to HDFS, I would really like to think more deeply as to why HDFS is a better choice for some workloads than, say, Luster or GPFS.
The things I like about HDFS over Luster is that 1) it is easier to set up 2) HDFS by default has local storage (as opposed to storage attached networks which is more typical for Luster deployments) making data locality for M/R jobs standard 3) HDFS lives in the Java world [which could be interpreted as a drawback I suppose] Dave -----Original Message----- From: Jeff Hammerbacher [mailto:[EMAIL PROTECTED]] Sent: Tuesday, May 11, 2010 3:29 PM To: [EMAIL PROTECTED] Subject: Re: Using HBase on other file systems Hey Edward, I do think that if you compare GoogleFS to HDFS, GFS looks more full > featured. > What features are you missing? Multi-writer append was explicitly called out by Sean Quinlan as a bad idea, and rolled back. From internal conversations with Google engineers, erasure coding of blocks suffered a similar fate. Native client access would certainly be nice, but FUSE gets you most of the way there. Scalability/availability of the NN, RPC QoS, alternative block placement strategies are second-order features which didn't exist in GFS until later in its lifecycle of development as well. HDFS is following a similar path and has JIRA tickets with active discussions. I'd love to hear your feature requests, and I'll be sure to translate them into JIRA tickets. I do believe my logic is reasonable. HBase has a lot of code designed around > HDFS. We know these tickets that get cited all the time, for better random > reads, or for sync() support. HBase gets the benefits of HDFS and has to > deal with its drawbacks. Other key value stores handle storage directly. > Sync() works and will be in the next release, and its absence was simply a result of the youth of the system. Now that that limitation has been removed, please point to another place in the code where using HDFS rather than the local file system is forcing HBase to make compromises. Your initial attempts on this front (caching, HFile, compactions) were, I hope, debunked by my previous email. It's also worth noting that Cassandra does all three, despite managing its own storage. I'm trying to learn from this exchange and always enjoy understanding new systems. Here's what I have so far from your arguments: 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly agree on the general point; I'm pressing you to name some specific disadvantages, in hopes of helping prioritize our development of HDFS. So far, you've named things which are either a) not actually disadvantages b) no longer true. If you can come up with the disadvantages, we'll certainly take them into account. I've certainly got a number of them on our roadmap. 2) If you don't want to use HDFS, you won't want to use HBase. Also certainly true, but I'm not sure there's not much to learn from this assertion. I'd once again ask: why would you not want to use HDFS, and what is your choice in its stead? Thanks, Jeff
-
Re: Using HBase on other file systemsKevin Apte 2010-05-12, 04:12
I think Gluster also supports large amounts of data- but as I understand it
- Gluster nodes are meant to be "Bricks" that is they are only meant for Storage. In Map-Reduce use - people talk about Map/Reduce jobs running near the storage- What does it mean? - They run on the same node that has the disks- so they are able to retrieve data fast. - They run closer to the node that has the data- this can reduce network traffic I think with declining cost of server and network switch ports, this should become less of an issue. I personally like a "Gluster" like architecture- storage bricks, striping files across multiple nodes and automatic self healing- I am assuming these features exist in all of the file systems- but Gluster seems to be low cost and professionally supported, as is Cloudera. Kevin On Wed, May 12, 2010 at 5:10 AM, Buttler, David <[EMAIL PROTECTED]> wrote: > If you are opening up the discussion to HDFS, I would really like to think > more deeply as to why HDFS is a better choice for some workloads than, say, > Luster or GPFS. > The things I like about HDFS over Luster is that > 1) it is easier to set up > 2) HDFS by default has local storage (as opposed to storage attached > networks which is more typical for Luster deployments) making data locality > for M/R jobs standard > 3) HDFS lives in the Java world [which could be interpreted as a drawback I > suppose] > > Dave > > > -----Original Message----- > From: Jeff Hammerbacher [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, May 11, 2010 3:29 PM > To: [EMAIL PROTECTED] > Subject: Re: Using HBase on other file systems > > Hey Edward, > > I do think that if you compare GoogleFS to HDFS, GFS looks more full > > featured. > > > > What features are you missing? Multi-writer append was explicitly called > out > by Sean Quinlan as a bad idea, and rolled back. From internal conversations > with Google engineers, erasure coding of blocks suffered a similar fate. > Native client access would certainly be nice, but FUSE gets you most of the > way there. Scalability/availability of the NN, RPC QoS, alternative block > placement strategies are second-order features which didn't exist in GFS > until later in its lifecycle of development as well. HDFS is following a > similar path and has JIRA tickets with active discussions. I'd love to hear > your feature requests, and I'll be sure to translate them into JIRA > tickets. > > I do believe my logic is reasonable. HBase has a lot of code designed > around > > HDFS. We know these tickets that get cited all the time, for better > random > > reads, or for sync() support. HBase gets the benefits of HDFS and has to > > deal with its drawbacks. Other key value stores handle storage directly. > > > > Sync() works and will be in the next release, and its absence was simply a > result of the youth of the system. Now that that limitation has been > removed, please point to another place in the code where using HDFS rather > than the local file system is forcing HBase to make compromises. Your > initial attempts on this front (caching, HFile, compactions) were, I hope, > debunked by my previous email. It's also worth noting that Cassandra does > all three, despite managing its own storage. > > I'm trying to learn from this exchange and always enjoy understanding new > systems. Here's what I have so far from your arguments: > 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly > agree on the general point; I'm pressing you to name some specific > disadvantages, in hopes of helping prioritize our development of HDFS. So > far, you've named things which are either a) not actually disadvantages b) > no longer true. If you can come up with the disadvantages, we'll certainly > take them into account. I've certainly got a number of them on our roadmap. > 2) If you don't want to use HDFS, you won't want to use HBase. Also > certainly true, but I'm not sure there's not much to learn from this
-
Re: Using HBase on other file systemsEdward Capriolo 2010-05-12, 13:38
On Tuesday, May 11, 2010, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote:
> Hey Edward, > > I do think that if you compare GoogleFS to HDFS, GFS looks more full >> featured. >> > > What features are you missing? Multi-writer append was explicitly called out > by Sean Quinlan as a bad idea, and rolled back. From internal conversations > with Google engineers, erasure coding of blocks suffered a similar fate. > Native client access would certainly be nice, but FUSE gets you most of the > way there. Scalability/availability of the NN, RPC QoS, alternative block > placement strategies are second-order features which didn't exist in GFS > until later in its lifecycle of development as well. HDFS is following a > similar path and has JIRA tickets with active discussions. I'd love to hear > your feature requests, and I'll be sure to translate them into JIRA tickets. > > I do believe my logic is reasonable. HBase has a lot of code designed around >> HDFS. We know these tickets that get cited all the time, for better random >> reads, or for sync() support. HBase gets the benefits of HDFS and has to >> deal with its drawbacks. Other key value stores handle storage directly. >> > > Sync() works and will be in the next release, and its absence was simply a > result of the youth of the system. Now that that limitation has been > removed, please point to another place in the code where using HDFS rather > than the local file system is forcing HBase to make compromises. Your > initial attempts on this front (caching, HFile, compactions) were, I hope, > debunked by my previous email. It's also worth noting that Cassandra does > all three, despite managing its own storage. > > I'm trying to learn from this exchange and always enjoy understanding new > systems. Here's what I have so far from your arguments: > 1) HBase inherits both the advantages and disadvantages of HDFS. I clearly > agree on the general point; I'm pressing you to name some specific > disadvantages, in hopes of helping prioritize our development of HDFS. So > far, you've named things which are either a) not actually disadvantages b) > no longer true. If you can come up with the disadvantages, we'll certainly > take them into account. I've certainly got a number of them on our roadmap. > 2) If you don't want to use HDFS, you won't want to use HBase. Also > certainly true, but I'm not sure there's not much to learn from this > assertion. I'd once again ask: why would you not want to use HDFS, and what > is your choice in its stead? > > Thanks, > Jeff > Jeff, Let me first mention that you have mentioned some thing as fixed, that are only fixed in trunk. I consider trunk futureware and I do not like to have tempral conversations. Even when trunk becomes current there is no guarentee that the entire problem is solved. After all appends were fixed in .19 or not , or again? I rescanned the gfs white paper to support my argument that hdfs is stripped down. Found Writes at offset ARE supported Checkpoints Application level checkpoints Snapshot Shadow read only master hdfs chose features it wanted and ignored others that is why I called it a pure map reduce implementation. My main point, is that hbase by nature needs high speed random read and random write. Hdfs by nature is bad at these things. If you can not keep a high cache hit rate via large block cache via ram hbase is going to slam hdfs doing large block reads for small parts of files. So you ask. Me what I would use instead. I do not think there is a viable alternative in the 100 tb and up range but I do think for people in the 20 tb range somethink like gluster that is very performance focused might deliver amazing results in some applications.
-
Re: Using HBase on other file systemsAndrew Purtell 2010-05-12, 17:30
Before recommending Gluster I suggest you set up a test cluster and then randomly kill bricks.
Also as pointed out in another mail, you'll want to colocate TaskTrackers on Gluster bricks to get I/O locality, yet there is no way for Gluster to export stripe locations back to Hadoop. It seems a poor choice. - Andy > From: Edward Capriolo > Subject: Re: Using HBase on other file systems > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Wednesday, May 12, 2010, 6:38 AM > On Tuesday, May 11, 2010, Jeff > Hammerbacher <[EMAIL PROTECTED]> > wrote: > > Hey Edward, > > > > I do think that if you compare GoogleFS to HDFS, GFS > looks more full > >> featured. > >> > > > > What features are you missing? Multi-writer append was > explicitly called out > > by Sean Quinlan as a bad idea, and rolled back. From > internal conversations > > with Google engineers, erasure coding of blocks > suffered a similar fate. > > Native client access would certainly be nice, but FUSE > gets you most of the > > way there. Scalability/availability of the NN, RPC > QoS, alternative block > > placement strategies are second-order features which > didn't exist in GFS > > until later in its lifecycle of development as well. > HDFS is following a > > similar path and has JIRA tickets with active > discussions. I'd love to hear > > your feature requests, and I'll be sure to translate > them into JIRA tickets. > > > > I do believe my logic is reasonable. HBase has a lot > of code designed around > >> HDFS. We know these tickets that get cited all > the time, for better random > >> reads, or for sync() support. HBase gets the > benefits of HDFS and has to > >> deal with its drawbacks. Other key value stores > handle storage directly. > >> > > > > Sync() works and will be in the next release, and its > absence was simply a > > result of the youth of the system. Now that that > limitation has been > > removed, please point to another place in the code > where using HDFS rather > > than the local file system is forcing HBase to make > compromises. Your > > initial attempts on this front (caching, HFile, > compactions) were, I hope, > > debunked by my previous email. It's also worth noting > that Cassandra does > > all three, despite managing its own storage. > > > > I'm trying to learn from this exchange and always > enjoy understanding new > > systems. Here's what I have so far from your > arguments: > > 1) HBase inherits both the advantages and > disadvantages of HDFS. I clearly > > agree on the general point; I'm pressing you to name > some specific > > disadvantages, in hopes of helping prioritize our > development of HDFS. So > > far, you've named things which are either a) not > actually disadvantages b) > > no longer true. If you can come up with the > disadvantages, we'll certainly > > take them into account. I've certainly got a number of > them on our roadmap. > > 2) If you don't want to use HDFS, you won't want to > use HBase. Also > > certainly true, but I'm not sure there's not much to > learn from this > > assertion. I'd once again ask: why would you not want > to use HDFS, and what > > is your choice in its stead? > > > > Thanks, > > Jeff > > > > Jeff, > > Let me first mention that you have mentioned some thing as > fixed, that > are only fixed in trunk. I consider trunk futureware and I > do not like > to have tempral conversations. Even when trunk becomes > current there > is no guarentee that the entire problem is solved. After > all appends > were fixed in .19 or not , or again? > > I rescanned the gfs white paper to support my argument that > hdfs is > stripped down. Found > Writes at offset ARE supported > Checkpoints > Application level checkpoints > Snapshot > Shadow read only master > > hdfs chose features it wanted and ignored others that is > why I called > it a pure map reduce implementation. > > My main point, is that hbase by nature needs high speed > random read > and random write. Hdfs by nature is bad at these things. If
-
Re: Using HBase on other file systemsEdward Capriolo 2010-05-12, 18:15
On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:
> Before recommending Gluster I suggest you set up a test cluster and then > randomly kill bricks. > > Also as pointed out in another mail, you'll want to colocate TaskTrackers > on Gluster bricks to get I/O locality, yet there is no way for Gluster to > export stripe locations back to Hadoop. > > It seems a poor choice. > > - Andy > > > From: Edward Capriolo > > Subject: Re: Using HBase on other file systems > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > Date: Wednesday, May 12, 2010, 6:38 AM > > On Tuesday, May 11, 2010, Jeff > > Hammerbacher <[EMAIL PROTECTED]> > > wrote: > > > Hey Edward, > > > > > > I do think that if you compare GoogleFS to HDFS, GFS > > looks more full > > >> featured. > > >> > > > > > > What features are you missing? Multi-writer append was > > explicitly called out > > > by Sean Quinlan as a bad idea, and rolled back. From > > internal conversations > > > with Google engineers, erasure coding of blocks > > suffered a similar fate. > > > Native client access would certainly be nice, but FUSE > > gets you most of the > > > way there. Scalability/availability of the NN, RPC > > QoS, alternative block > > > placement strategies are second-order features which > > didn't exist in GFS > > > until later in its lifecycle of development as well. > > HDFS is following a > > > similar path and has JIRA tickets with active > > discussions. I'd love to hear > > > your feature requests, and I'll be sure to translate > > them into JIRA tickets. > > > > > > I do believe my logic is reasonable. HBase has a lot > > of code designed around > > >> HDFS. We know these tickets that get cited all > > the time, for better random > > >> reads, or for sync() support. HBase gets the > > benefits of HDFS and has to > > >> deal with its drawbacks. Other key value stores > > handle storage directly. > > >> > > > > > > Sync() works and will be in the next release, and its > > absence was simply a > > > result of the youth of the system. Now that that > > limitation has been > > > removed, please point to another place in the code > > where using HDFS rather > > > than the local file system is forcing HBase to make > > compromises. Your > > > initial attempts on this front (caching, HFile, > > compactions) were, I hope, > > > debunked by my previous email. It's also worth noting > > that Cassandra does > > > all three, despite managing its own storage. > > > > > > I'm trying to learn from this exchange and always > > enjoy understanding new > > > systems. Here's what I have so far from your > > arguments: > > > 1) HBase inherits both the advantages and > > disadvantages of HDFS. I clearly > > > agree on the general point; I'm pressing you to name > > some specific > > > disadvantages, in hopes of helping prioritize our > > development of HDFS. So > > > far, you've named things which are either a) not > > actually disadvantages b) > > > no longer true. If you can come up with the > > disadvantages, we'll certainly > > > take them into account. I've certainly got a number of > > them on our roadmap. > > > 2) If you don't want to use HDFS, you won't want to > > use HBase. Also > > > certainly true, but I'm not sure there's not much to > > learn from this > > > assertion. I'd once again ask: why would you not want > > to use HDFS, and what > > > is your choice in its stead? > > > > > > Thanks, > > > Jeff > > > > > > > Jeff, > > > > Let me first mention that you have mentioned some thing as > > fixed, that > > are only fixed in trunk. I consider trunk futureware and I > > do not like > > to have tempral conversations. Even when trunk becomes > > current there > > is no guarentee that the entire problem is solved. After > > all appends > > were fixed in .19 or not , or again? > > > > I rescanned the gfs white paper to support my argument that > > hdfs is > > stripped down. Found > > Writes at offset ARE supported > > Checkpoints > > Application level checkpoints I did not recommend anything "people in the 20 tb range somethink like gluster that is very performance focused might deliver amazing results in some applications." I used words like "something. like. might." It may just be an interesting avenue of research. And since you mentioned "also as pointed out in another mail, you'll want to colocate TaskTrackers on Gluster bricks to get I/O locality, yet there is no way for Gluster to export stripe locations back to Hadoop." 1) I am sure if someone was so included they could find a way to export that information from Gluster. 2) I think you meant DataNode not TaskTracker. In any case, I remember reading on list that a RegionServer is not guarenteed to be colocated with a datanode, especially after a restart. Someone was going to open a ticket for it.
-
Re: Using HBase on other file systemsJeff Hammerbacher 2010-05-13, 04:26
Some projects sacrifice stability and manageability for performance (see,
e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html). On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo <[EMAIL PROTECTED]>wrote: > On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > Before recommending Gluster I suggest you set up a test cluster and then > > randomly kill bricks. > > > > Also as pointed out in another mail, you'll want to colocate TaskTrackers > > on Gluster bricks to get I/O locality, yet there is no way for Gluster to > > export stripe locations back to Hadoop. > > > > It seems a poor choice. > > > > - Andy > > > > > From: Edward Capriolo > > > Subject: Re: Using HBase on other file systems > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > > Date: Wednesday, May 12, 2010, 6:38 AM > > > On Tuesday, May 11, 2010, Jeff > > > Hammerbacher <[EMAIL PROTECTED]> > > > wrote: > > > > Hey Edward, > > > > > > > > I do think that if you compare GoogleFS to HDFS, GFS > > > looks more full > > > >> featured. > > > >> > > > > > > > > What features are you missing? Multi-writer append was > > > explicitly called out > > > > by Sean Quinlan as a bad idea, and rolled back. From > > > internal conversations > > > > with Google engineers, erasure coding of blocks > > > suffered a similar fate. > > > > Native client access would certainly be nice, but FUSE > > > gets you most of the > > > > way there. Scalability/availability of the NN, RPC > > > QoS, alternative block > > > > placement strategies are second-order features which > > > didn't exist in GFS > > > > until later in its lifecycle of development as well. > > > HDFS is following a > > > > similar path and has JIRA tickets with active > > > discussions. I'd love to hear > > > > your feature requests, and I'll be sure to translate > > > them into JIRA tickets. > > > > > > > > I do believe my logic is reasonable. HBase has a lot > > > of code designed around > > > >> HDFS. We know these tickets that get cited all > > > the time, for better random > > > >> reads, or for sync() support. HBase gets the > > > benefits of HDFS and has to > > > >> deal with its drawbacks. Other key value stores > > > handle storage directly. > > > >> > > > > > > > > Sync() works and will be in the next release, and its > > > absence was simply a > > > > result of the youth of the system. Now that that > > > limitation has been > > > > removed, please point to another place in the code > > > where using HDFS rather > > > > than the local file system is forcing HBase to make > > > compromises. Your > > > > initial attempts on this front (caching, HFile, > > > compactions) were, I hope, > > > > debunked by my previous email. It's also worth noting > > > that Cassandra does > > > > all three, despite managing its own storage. > > > > > > > > I'm trying to learn from this exchange and always > > > enjoy understanding new > > > > systems. Here's what I have so far from your > > > arguments: > > > > 1) HBase inherits both the advantages and > > > disadvantages of HDFS. I clearly > > > > agree on the general point; I'm pressing you to name > > > some specific > > > > disadvantages, in hopes of helping prioritize our > > > development of HDFS. So > > > > far, you've named things which are either a) not > > > actually disadvantages b) > > > > no longer true. If you can come up with the > > > disadvantages, we'll certainly > > > > take them into account. I've certainly got a number of > > > them on our roadmap. > > > > 2) If you don't want to use HDFS, you won't want to > > > use HBase. Also > > > > certainly true, but I'm not sure there's not much to > > > learn from this > > > > assertion. I'd once again ask: why would you not want > > > to use HDFS, and what > > > > is your choice in its stead? > > > > > > > > Thanks, > > > > Jeff > > > > > > > > > > Jeff, > > > > > > Let me first mention that you have mentioned some thing as > > > fixed, that
-
Re: Using HBase on other file systemsEdward Capriolo 2010-05-13, 15:09
On Thu, May 13, 2010 at 12:26 AM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote:
> Some projects sacrifice stability and manageability for performance (see, > e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html > ). > > On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo <[EMAIL PROTECTED] > >wrote: > > > On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> > > wrote: > > > > > Before recommending Gluster I suggest you set up a test cluster and > then > > > randomly kill bricks. > > > > > > Also as pointed out in another mail, you'll want to colocate > TaskTrackers > > > on Gluster bricks to get I/O locality, yet there is no way for Gluster > to > > > export stripe locations back to Hadoop. > > > > > > It seems a poor choice. > > > > > > - Andy > > > > > > > From: Edward Capriolo > > > > Subject: Re: Using HBase on other file systems > > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > > > Date: Wednesday, May 12, 2010, 6:38 AM > > > > On Tuesday, May 11, 2010, Jeff > > > > Hammerbacher <[EMAIL PROTECTED]> > > > > wrote: > > > > > Hey Edward, > > > > > > > > > > I do think that if you compare GoogleFS to HDFS, GFS > > > > looks more full > > > > >> featured. > > > > >> > > > > > > > > > > What features are you missing? Multi-writer append was > > > > explicitly called out > > > > > by Sean Quinlan as a bad idea, and rolled back. From > > > > internal conversations > > > > > with Google engineers, erasure coding of blocks > > > > suffered a similar fate. > > > > > Native client access would certainly be nice, but FUSE > > > > gets you most of the > > > > > way there. Scalability/availability of the NN, RPC > > > > QoS, alternative block > > > > > placement strategies are second-order features which > > > > didn't exist in GFS > > > > > until later in its lifecycle of development as well. > > > > HDFS is following a > > > > > similar path and has JIRA tickets with active > > > > discussions. I'd love to hear > > > > > your feature requests, and I'll be sure to translate > > > > them into JIRA tickets. > > > > > > > > > > I do believe my logic is reasonable. HBase has a lot > > > > of code designed around > > > > >> HDFS. We know these tickets that get cited all > > > > the time, for better random > > > > >> reads, or for sync() support. HBase gets the > > > > benefits of HDFS and has to > > > > >> deal with its drawbacks. Other key value stores > > > > handle storage directly. > > > > >> > > > > > > > > > > Sync() works and will be in the next release, and its > > > > absence was simply a > > > > > result of the youth of the system. Now that that > > > > limitation has been > > > > > removed, please point to another place in the code > > > > where using HDFS rather > > > > > than the local file system is forcing HBase to make > > > > compromises. Your > > > > > initial attempts on this front (caching, HFile, > > > > compactions) were, I hope, > > > > > debunked by my previous email. It's also worth noting > > > > that Cassandra does > > > > > all three, despite managing its own storage. > > > > > > > > > > I'm trying to learn from this exchange and always > > > > enjoy understanding new > > > > > systems. Here's what I have so far from your > > > > arguments: > > > > > 1) HBase inherits both the advantages and > > > > disadvantages of HDFS. I clearly > > > > > agree on the general point; I'm pressing you to name > > > > some specific > > > > > disadvantages, in hopes of helping prioritize our > > > > development of HDFS. So > > > > > far, you've named things which are either a) not > > > > actually disadvantages b) > > > > > no longer true. If you can come up with the > > > > disadvantages, we'll certainly > > > > > take them into account. I've certainly got a number of > > > > them on our roadmap. > > > > > 2) If you don't want to use HDFS, you won't want to > > > > use HBase. Also > > > > > certainly true, but I'm not sure there's not much to > > > > learn from this Posting a single link from the mailing list is anecdotal. I can point to many posts on the Hadoop-user, HBase user, and every product and the world and come to the determination that the product is unstable as a result. (I am a member of gluster-users fyi) As for gluster, people are pushing it to do much more then hadoop. Most are implementing cachining and posix locks on gluster as it works as a true filesystem, not a userspace filesystem with limited semantics like HDFS, so it is going to be more complex and have more problems, but you can do with it things you can not do with hadoop. I am not claiming that GlusterFS is more/less buggy performs better/worse then HDFS. What I am hypothisizing is: GlusterFS might have sweet-spot. 20 Gluster Bricks connected by infiniban, with a total storage capacity of 50 TB. Throw hbase on that infiniban-bad boy and maybe get amazing perfomance. Just maybe. Sure HBase&Hadoop will almost assuredly scale better on the high end, but take into my account my hypothesis and use case. Maybe I have a fixed datasize but want the best performance possible. It is all about the sweat spot for your needs. I think HDFS is great, better then great, but I do not think it is the apex of storage technology, and perfect for every use case. I am not going to stop researching, theorizing, and trying alternative systems and implementations.
-
Re: Using HBase on other file systemsRyan Rawson 2010-05-13, 19:46
Hey,
I think one of the key features of HDFS is its ability to be run on standard hardware and integrate nicely in a standardized datacenter environment. I never would have got my project off the ground if I had to convince my company to invest in infiniband switches. So in the situation you described, you are getting only 50TB of storage on 20 nodes and the parts list would be something like: - 20 storage "bricks" w/infiniband and gigE ports - infiniband switch, min 20 ports - probably better to get more - 20 more HBase nodes, i'd like to have machines with 16+ GB ram, ideally 24GB and above At this point we could compare to my cluster setup which has 67TB of raw space reported by HDFS: - 20 HBase+HDFS nodes, 4TB/node, 16core w/24GB ram In my case I am paying about $3-4k/node (depending on when you bought them and from who) and I can leverage the gigE switching fabric (lower cost per port). So gluster sounds like and interesting but it sounds like at least 2x as expensive for less space. Presumably the performance benefits would make it up, but if the clients aren't connected by infiniband would you really see it? At at least $1000/port I'm not sure it's really worth it. On Thu, May 13, 2010 at 8:09 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > On Thu, May 13, 2010 at 12:26 AM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote: > >> Some projects sacrifice stability and manageability for performance (see, >> e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html >> ). >> >> On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo <[EMAIL PROTECTED] >> >wrote: >> >> > On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> >> > wrote: >> > >> > > Before recommending Gluster I suggest you set up a test cluster and >> then >> > > randomly kill bricks. >> > > >> > > Also as pointed out in another mail, you'll want to colocate >> TaskTrackers >> > > on Gluster bricks to get I/O locality, yet there is no way for Gluster >> to >> > > export stripe locations back to Hadoop. >> > > >> > > It seems a poor choice. >> > > >> > > - Andy >> > > >> > > > From: Edward Capriolo >> > > > Subject: Re: Using HBase on other file systems >> > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> > > > Date: Wednesday, May 12, 2010, 6:38 AM >> > > > On Tuesday, May 11, 2010, Jeff >> > > > Hammerbacher <[EMAIL PROTECTED]> >> > > > wrote: >> > > > > Hey Edward, >> > > > > >> > > > > I do think that if you compare GoogleFS to HDFS, GFS >> > > > looks more full >> > > > >> featured. >> > > > >> >> > > > > >> > > > > What features are you missing? Multi-writer append was >> > > > explicitly called out >> > > > > by Sean Quinlan as a bad idea, and rolled back. From >> > > > internal conversations >> > > > > with Google engineers, erasure coding of blocks >> > > > suffered a similar fate. >> > > > > Native client access would certainly be nice, but FUSE >> > > > gets you most of the >> > > > > way there. Scalability/availability of the NN, RPC >> > > > QoS, alternative block >> > > > > placement strategies are second-order features which >> > > > didn't exist in GFS >> > > > > until later in its lifecycle of development as well. >> > > > HDFS is following a >> > > > > similar path and has JIRA tickets with active >> > > > discussions. I'd love to hear >> > > > > your feature requests, and I'll be sure to translate >> > > > them into JIRA tickets. >> > > > > >> > > > > I do believe my logic is reasonable. HBase has a lot >> > > > of code designed around >> > > > >> HDFS. We know these tickets that get cited all >> > > > the time, for better random >> > > > >> reads, or for sync() support. HBase gets the >> > > > benefits of HDFS and has to >> > > > >> deal with its drawbacks. Other key value stores >> > > > handle storage directly. >> > > > >> >> > > > > >> > > > > Sync() works and will be in the next release, and its >> > > > absence was simply a >> > > > > result of the youth of the system. Now that that
-
RE: Using HBase on other file systemsGibbon, Robert, VF-Group 2010-05-13, 20:22
Yo
I feel the need to speak up. GlusterFS is pretty configurable. It doesn't rely on HBAs but it does support them. Gig or 10G ethernet are also supported options. I would love to see HBase become GlusterFS aware, because the architecture is, frankly, more flexible than HDFS with fewer SPoF concerns. GlusterFS is node aware with the Disco MapReduce framework - why not HBase? NB. I checked out running HBase over Walrus (an AWS S3 clone): bork - you want me to file a Jira on that? -----Original Message----- From: Ryan Rawson [mailto:[EMAIL PROTECTED]] Sent: Thu 5/13/2010 9:46 PM To: [EMAIL PROTECTED] Subject: Re: Using HBase on other file systems Hey, I think one of the key features of HDFS is its ability to be run on standard hardware and integrate nicely in a standardized datacenter environment. I never would have got my project off the ground if I had to convince my company to invest in infiniband switches. So in the situation you described, you are getting only 50TB of storage on 20 nodes and the parts list would be something like: - 20 storage "bricks" w/infiniband and gigE ports - infiniband switch, min 20 ports - probably better to get more - 20 more HBase nodes, i'd like to have machines with 16+ GB ram, ideally 24GB and above At this point we could compare to my cluster setup which has 67TB of raw space reported by HDFS: - 20 HBase+HDFS nodes, 4TB/node, 16core w/24GB ram In my case I am paying about $3-4k/node (depending on when you bought them and from who) and I can leverage the gigE switching fabric (lower cost per port). So gluster sounds like and interesting but it sounds like at least 2x as expensive for less space. Presumably the performance benefits would make it up, but if the clients aren't connected by infiniband would you really see it? At at least $1000/port I'm not sure it's really worth it. On Thu, May 13, 2010 at 8:09 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: > On Thu, May 13, 2010 at 12:26 AM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote: > >> Some projects sacrifice stability and manageability for performance (see, >> e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html >> ). >> >> On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo <[EMAIL PROTECTED] >> >wrote: >> >> > On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> >> > wrote: >> > >> > > Before recommending Gluster I suggest you set up a test cluster and >> then >> > > randomly kill bricks. >> > > >> > > Also as pointed out in another mail, you'll want to colocate >> TaskTrackers >> > > on Gluster bricks to get I/O locality, yet there is no way for Gluster >> to >> > > export stripe locations back to Hadoop. >> > > >> > > It seems a poor choice. >> > > >> > > - Andy >> > > >> > > > From: Edward Capriolo >> > > > Subject: Re: Using HBase on other file systems >> > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> > > > Date: Wednesday, May 12, 2010, 6:38 AM >> > > > On Tuesday, May 11, 2010, Jeff >> > > > Hammerbacher <[EMAIL PROTECTED]> >> > > > wrote: >> > > > > Hey Edward, >> > > > > >> > > > > I do think that if you compare GoogleFS to HDFS, GFS >> > > > looks more full >> > > > >> featured. >> > > > >> >> > > > > >> > > > > What features are you missing? Multi-writer append was >> > > > explicitly called out >> > > > > by Sean Quinlan as a bad idea, and rolled back. From >> > > > internal conversations >> > > > > with Google engineers, erasure coding of blocks >> > > > suffered a similar fate. >> > > > > Native client access would certainly be nice, but FUSE >> > > > gets you most of the >> > > > > way there. Scalability/availability of the NN, RPC >> > > > QoS, alternative block >> > > > > placement strategies are second-order features which >> > > > didn't exist in GFS >> > > > > until later in its lifecycle of development as well. >> > > > HDFS is following a >> > > > > similar path and has JIRA tickets with active >> > > > discussions. I'd love to hear
-
Re: Using HBase on other file systemsRyan Rawson 2010-05-13, 20:53
Hey,
I was more suggesting that the use of infiniband needs more justification - the per-node cost and the proposed storage density above was far below what I would consider is reasonable (if I was getting 200-500TB on 20 storage nodes I'd be more amenable). As for testing GlusterFS, I am obliquely interested, and I would always be interested in hearing people's successes or failures on reasonably sized datasets and Gluster. As for Walrus, does it have the same eventual consistency promises as S3? It's hard for HBase to be run reliably on a system which cannot read back the file we just wrote (ie: you wont be able to flush properly). On Thu, May 13, 2010 at 1:22 PM, Gibbon, Robert, VF-Group <[EMAIL PROTECTED]> wrote: > Yo > > I feel the need to speak up. > > GlusterFS is pretty configurable. It doesn't rely on HBAs but it does support them. Gig or 10G ethernet are also supported options. I would love to see HBase become GlusterFS aware, because the architecture is, frankly, more flexible than HDFS with fewer SPoF concerns. GlusterFS is node aware with the Disco MapReduce framework - why not HBase? > > NB. I checked out running HBase over Walrus (an AWS S3 clone): bork - you want me to file a Jira on that? > > > -----Original Message----- > From: Ryan Rawson [mailto:[EMAIL PROTECTED]] > Sent: Thu 5/13/2010 9:46 PM > To: [EMAIL PROTECTED] > Subject: Re: Using HBase on other file systems > > Hey, > > I think one of the key features of HDFS is its ability to be run on > standard hardware and integrate nicely in a standardized datacenter > environment. I never would have got my project off the ground if I > had to convince my company to invest in infiniband switches. > > So in the situation you described, you are getting only 50TB of > storage on 20 nodes and the parts list would be something like: > - 20 storage "bricks" w/infiniband and gigE ports > - infiniband switch, min 20 ports - probably better to get more > - 20 more HBase nodes, i'd like to have machines with 16+ GB ram, > ideally 24GB and above > > At this point we could compare to my cluster setup which has 67TB of > raw space reported by HDFS: > - 20 HBase+HDFS nodes, 4TB/node, 16core w/24GB ram > > In my case I am paying about $3-4k/node (depending on when you bought > them and from who) and I can leverage the gigE switching fabric (lower > cost per port). > > So gluster sounds like and interesting but it sounds like at least 2x > as expensive for less space. Presumably the performance benefits > would make it up, but if the clients aren't connected by infiniband > would you really see it? At at least $1000/port I'm not sure it's > really worth it. > > On Thu, May 13, 2010 at 8:09 AM, Edward Capriolo <[EMAIL PROTECTED]> wrote: >> On Thu, May 13, 2010 at 12:26 AM, Jeff Hammerbacher <[EMAIL PROTECTED]>wrote: >> >>> Some projects sacrifice stability and manageability for performance (see, >>> e.g., http://gluster.org/pipermail/gluster-users/2009-October/003193.html >>> ). >>> >>> On Wed, May 12, 2010 at 11:15 AM, Edward Capriolo <[EMAIL PROTECTED] >>> >wrote: >>> >>> > On Wed, May 12, 2010 at 1:30 PM, Andrew Purtell <[EMAIL PROTECTED]> >>> > wrote: >>> > >>> > > Before recommending Gluster I suggest you set up a test cluster and >>> then >>> > > randomly kill bricks. >>> > > >>> > > Also as pointed out in another mail, you'll want to colocate >>> TaskTrackers >>> > > on Gluster bricks to get I/O locality, yet there is no way for Gluster >>> to >>> > > export stripe locations back to Hadoop. >>> > > >>> > > It seems a poor choice. >>> > > >>> > > - Andy >>> > > >>> > > > From: Edward Capriolo >>> > > > Subject: Re: Using HBase on other file systems >>> > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >>> > > > Date: Wednesday, May 12, 2010, 6:38 AM >>> > > > On Tuesday, May 11, 2010, Jeff >>> > > > Hammerbacher <[EMAIL PROTECTED]> >>> > > > wrote: >>> > > > > Hey Edward, >>> > > > > >>> > > > > I do think that if you compare GoogleFS to HDFS, GFS
-
RE: Using HBase on other file systemsAndrew Purtell 2010-05-13, 21:54
You really want to run HBase backed by Eucalyptus' Walrus? What do you have behind that?
> From: Gibbon, Robert, VF-Group > Subject: RE: Using HBase on other file systems [...] > NB. I checked out running HBase over Walrus (an AWS S3 > clone): bork - you want me to file a Jira on that?
-
RE: Using HBase on other file systemsGibbon, Robert, VF-Group 2010-05-14, 19:02
My thinking is around separation of concerns - at an OU level not just at a system integration level. Walrus gives me a consistent, usable abstraction layer to transparently substitute the storage implementation - for example from symmetrix <--> isilon or anything in between. Walrus is storage subsystem agnostic, so it need not be configured for inconsistency like the Amazon service it emulates. Tight coupling for lock-in is a great commercial technique often seen with suppliers. But it is a bad one. Very bad. -----Original Message----- From: Andrew Purtell [mailto:[EMAIL PROTECTED]] Sent: Thu 5/13/2010 11:54 PM To: [EMAIL PROTECTED] Subject: RE: Using HBase on other file systems You really want to run HBase backed by Eucalyptus' Walrus? What do you have behind that? > From: Gibbon, Robert, VF-Group > Subject: RE: Using HBase on other file systems [...] > NB. I checked out running HBase over Walrus (an AWS S3 > clone): bork - you want me to file a Jira on that?
-
Re: Using HBase on other file systemsTodd Lipcon 2010-05-14, 20:01
On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group <
[EMAIL PROTECTED]> wrote: > > My thinking is around separation of concerns - at an OU level not just at a > system integration level. Walrus gives me a consistent, usable abstraction > layer to transparently substitute the storage implementation - for example > from symmetrix <--> isilon or anything in between. Walrus is storage > subsystem agnostic, so it need not be configured for inconsistency like the > Amazon service it emulates. > > Tight coupling for lock-in is a great commercial technique often seen with > suppliers. But it is a bad one. Very bad. > However, reasonably tight coupling between a database (HBase) and its storage layer (HDFS) is IMHO absolutely necessary to achieve a certain level of correctness and performance. In HBase's case we use the Hadoop FileSystem interface, so in theory it will work on anyone who has implemented said interface, but I wouldn't run a production instance on anything but HDFS. It's worth noting that most commercial databases operate on direct block devices rather than on top of filesystems, so that they don't have to deal with varying semantics/performance between ext3,ext4,xfs,ufs, myriad other single-node filesystems that exist. -Todd > > > -----Original Message----- > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > Sent: Thu 5/13/2010 11:54 PM > To: [EMAIL PROTECTED] > Subject: RE: Using HBase on other file systems > > You really want to run HBase backed by Eucalyptus' Walrus? What do you have > behind that? > > > From: Gibbon, Robert, VF-Group > > Subject: RE: Using HBase on other file systems > [...] > > NB. I checked out running HBase over Walrus (an AWS S3 > > clone): bork - you want me to file a Jira on that? > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
RE: Using HBase on other file systemsGibbon, Robert, VF-Group 2010-05-14, 21:15
Hmm. What level of IOPs does Hbase need in order to support a reasonably responsive level of service? How much latency in transfer times is acceptable before the nodes start to fail? Do you use asynchronous IO queueing? Write-through caching? Prefetching?
On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < [EMAIL PROTECTED]> wrote: > > My thinking is around separation of concerns - at an OU level not just at a > system integration level. Walrus gives me a consistent, usable abstraction > layer to transparently substitute the storage implementation - for example > from symmetrix <--> isilon or anything in between. Walrus is storage > subsystem agnostic, so it need not be configured for inconsistency like the > Amazon service it emulates. > > Tight coupling for lock-in is a great commercial technique often seen with > suppliers. But it is a bad one. Very bad. > However, reasonably tight coupling between a database (HBase) and its storage layer (HDFS) is IMHO absolutely necessary to achieve a certain level of correctness and performance. In HBase's case we use the Hadoop FileSystem interface, so in theory it will work on anyone who has implemented said interface, but I wouldn't run a production instance on anything but HDFS. It's worth noting that most commercial databases operate on direct block devices rather than on top of filesystems, so that they don't have to deal with varying semantics/performance between ext3,ext4,xfs,ufs, myriad other single-node filesystems that exist. -Todd > > > -----Original Message----- > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > Sent: Thu 5/13/2010 11:54 PM > To: [EMAIL PROTECTED] > Subject: RE: Using HBase on other file systems > > You really want to run HBase backed by Eucalyptus' Walrus? What do you have > behind that? > > > From: Gibbon, Robert, VF-Group > > Subject: RE: Using HBase on other file systems > [...] > > NB. I checked out running HBase over Walrus (an AWS S3 > > clone): bork - you want me to file a Jira on that? > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Using HBase on other file systemsTodd Lipcon 2010-05-15, 01:51
On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group <
[EMAIL PROTECTED]> wrote: > Hmm. What level of IOPs does Hbase need in order to support a reasonably > responsive level of service? How much latency in transfer times is > acceptable before the nodes start to fail? Do you use asynchronous IO > queueing? Write-through caching? Prefetching? > > Hi Robert. Have you read the Bigtable paper? It's a good description of the general IO architecture of BigTable. You can also read the original paper on Log-structured merge tree storage from back in the 90s. To answer your questions in brief: - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people run on 10k disks to get more random reads per second, but not necessary - latency in transfer times is a matter of what your application needs, not a matter of what HBase needs. - no, we do not asynchronously queue reads - AIO support is lacking in Java 6 and even in the current previews of Java7 it is a thin wrapper around threadpools and synchronous IO APIs. - HBases uses log-structured storage, which is somewhat the same as write-through caching in a way. We never do random-writes (in fact they're impossible in HDFS) -Todd > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > [EMAIL PROTECTED]> wrote: > > > > > My thinking is around separation of concerns - at an OU level not just at > a > > system integration level. Walrus gives me a consistent, usable > abstraction > > layer to transparently substitute the storage implementation - for > example > > from symmetrix <--> isilon or anything in between. Walrus is storage > > subsystem agnostic, so it need not be configured for inconsistency like > the > > Amazon service it emulates. > > > > Tight coupling for lock-in is a great commercial technique often seen > with > > suppliers. But it is a bad one. Very bad. > > > > However, reasonably tight coupling between a database (HBase) and its > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > level > of correctness and performance. In HBase's case we use the Hadoop > FileSystem > interface, so in theory it will work on anyone who has implemented said > interface, but I wouldn't run a production instance on anything but HDFS. > > It's worth noting that most commercial databases operate on direct block > devices rather than on top of filesystems, so that they don't have to deal > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad other > single-node filesystems that exist. > > -Todd > > > > > > > > -----Original Message----- > > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > > Sent: Thu 5/13/2010 11:54 PM > > To: [EMAIL PROTECTED] > > Subject: RE: Using HBase on other file systems > > > > You really want to run HBase backed by Eucalyptus' Walrus? What do you > have > > behind that? > > > > > From: Gibbon, Robert, VF-Group > > > Subject: RE: Using HBase on other file systems > > [...] > > > NB. I checked out running HBase over Walrus (an AWS S3 > > > clone): bork - you want me to file a Jira on that? > > > > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera
-
RE: Using HBase on other file systemsGibbon, Robert, VF-Group 2010-05-15, 20:19
Todd thanks for replying. 4x 7200 spindles and no RAID = approx 360 IOPS to/from the backend storage, minimum and per node to run an HBase cluster. Right? cheers Robert -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Sat 5/15/2010 3:51 AM To: [EMAIL PROTECTED] Subject: Re: Using HBase on other file systems On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group < [EMAIL PROTECTED]> wrote: > Hmm. What level of IOPs does Hbase need in order to support a reasonably > responsive level of service? How much latency in transfer times is > acceptable before the nodes start to fail? Do you use asynchronous IO > queueing? Write-through caching? Prefetching? > > Hi Robert. Have you read the Bigtable paper? It's a good description of the general IO architecture of BigTable. You can also read the original paper on Log-structured merge tree storage from back in the 90s. To answer your questions in brief: - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people run on 10k disks to get more random reads per second, but not necessary - latency in transfer times is a matter of what your application needs, not a matter of what HBase needs. - no, we do not asynchronously queue reads - AIO support is lacking in Java 6 and even in the current previews of Java7 it is a thin wrapper around threadpools and synchronous IO APIs. - HBases uses log-structured storage, which is somewhat the same as write-through caching in a way. We never do random-writes (in fact they're impossible in HDFS) -Todd > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > [EMAIL PROTECTED]> wrote: > > > > > My thinking is around separation of concerns - at an OU level not just at > a > > system integration level. Walrus gives me a consistent, usable > abstraction > > layer to transparently substitute the storage implementation - for > example > > from symmetrix <--> isilon or anything in between. Walrus is storage > > subsystem agnostic, so it need not be configured for inconsistency like > the > > Amazon service it emulates. > > > > Tight coupling for lock-in is a great commercial technique often seen > with > > suppliers. But it is a bad one. Very bad. > > > > However, reasonably tight coupling between a database (HBase) and its > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > level > of correctness and performance. In HBase's case we use the Hadoop > FileSystem > interface, so in theory it will work on anyone who has implemented said > interface, but I wouldn't run a production instance on anything but HDFS. > > It's worth noting that most commercial databases operate on direct block > devices rather than on top of filesystems, so that they don't have to deal > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad other > single-node filesystems that exist. > > -Todd > > > > > > > > -----Original Message----- > > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > > Sent: Thu 5/13/2010 11:54 PM > > To: [EMAIL PROTECTED] > > Subject: RE: Using HBase on other file systems > > > > You really want to run HBase backed by Eucalyptus' Walrus? What do you > have > > behind that? > > > > > From: Gibbon, Robert, VF-Group > > > Subject: RE: Using HBase on other file systems > > [...] > > > NB. I checked out running HBase over Walrus (an AWS S3 > > > clone): bork - you want me to file a Jira on that? > > > > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Using HBase on other file systemsbaleksan@... 2010-05-15, 20:43
®erred
Sent via BlackBerry from T-Mobile -----Original Message----- From: "Gibbon, Robert, VF-Group" <[EMAIL PROTECTED]> Date: Sat, 15 May 2010 22:19:57 To: <[EMAIL PROTECTED]> Subject: RE: Using HBase on other file systems Todd thanks for replying. 4x 7200 spindles and no RAID = approx 360 IOPS to/from the backend storage, minimum and per node to run an HBase cluster. Right? cheers Robert -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Sat 5/15/2010 3:51 AM To: [EMAIL PROTECTED] Subject: Re: Using HBase on other file systems On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group < [EMAIL PROTECTED]> wrote: > Hmm. What level of IOPs does Hbase need in order to support a reasonably > responsive level of service? How much latency in transfer times is > acceptable before the nodes start to fail? Do you use asynchronous IO > queueing? Write-through caching? Prefetching? > > Hi Robert. Have you read the Bigtable paper? It's a good description of the general IO architecture of BigTable. You can also read the original paper on Log-structured merge tree storage from back in the 90s. To answer your questions in brief: - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people run on 10k disks to get more random reads per second, but not necessary - latency in transfer times is a matter of what your application needs, not a matter of what HBase needs. - no, we do not asynchronously queue reads - AIO support is lacking in Java 6 and even in the current previews of Java7 it is a thin wrapper around threadpools and synchronous IO APIs. - HBases uses log-structured storage, which is somewhat the same as write-through caching in a way. We never do random-writes (in fact they're impossible in HDFS) -Todd > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > [EMAIL PROTECTED]> wrote: > > > > > My thinking is around separation of concerns - at an OU level not just at > a > > system integration level. Walrus gives me a consistent, usable > abstraction > > layer to transparently substitute the storage implementation - for > example > > from symmetrix <--> isilon or anything in between. Walrus is storage > > subsystem agnostic, so it need not be configured for inconsistency like > the > > Amazon service it emulates. > > > > Tight coupling for lock-in is a great commercial technique often seen > with > > suppliers. But it is a bad one. Very bad. > > > > However, reasonably tight coupling between a database (HBase) and its > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > level > of correctness and performance. In HBase's case we use the Hadoop > FileSystem > interface, so in theory it will work on anyone who has implemented said > interface, but I wouldn't run a production instance on anything but HDFS. > > It's worth noting that most commercial databases operate on direct block > devices rather than on top of filesystems, so that they don't have to deal > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad other > single-node filesystems that exist. > > -Todd > > > > > > > > -----Original Message----- > > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > > Sent: Thu 5/13/2010 11:54 PM > > To: [EMAIL PROTECTED] > > Subject: RE: Using HBase on other file systems > > > > You really want to run HBase backed by Eucalyptus' Walrus? What do you > have > > behind that? > > > > > From: Gibbon, Robert, VF-Group > > > Subject: RE: Using HBase on other file systems > > [...] > > > NB. I checked out running HBase over Walrus (an AWS S3 > > > clone): bork - you want me to file a Jira on that? > > > > > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > -- Todd Lipcon Software Engineer, Cloudera
-
RE: Using HBase on other file systemsAndrew Purtell 2010-05-15, 21:30
No, Todd was not specifying some kind of minimum. The point was the more spindles, the better for an I/O parallel architecture like HDFS and BigTable. Have you read the BigTable paper?
- Andy > From: Gibbon, Robert, VF-Group > Subject: RE: Using HBase on other file systems > > Todd thanks for replying. 4x 7200 spindles and no RAID > approx 360 IOPS to/from the backend storage, minimum and per > node to run an HBase cluster. > > Right? > > cheers > Robert > > -----Original Message----- > From: Todd Lipcon [mailto:[EMAIL PROTECTED]] > Sent: Sat 5/15/2010 3:51 AM > To: [EMAIL PROTECTED] > Subject: Re: Using HBase on other file systems > > On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group > < > [EMAIL PROTECTED]> > wrote: > > > Hmm. What level of IOPs does Hbase need in order to > support a reasonably > > responsive level of service? How much latency in > transfer times is > > acceptable before the nodes start to fail? Do you use > asynchronous IO > > queueing? Write-through caching? Prefetching? > > > > > Hi Robert. Have you read the Bigtable paper? It's a good > description of the > general IO architecture of BigTable. You can also read the > original paper on > Log-structured merge tree storage from back in the 90s. > > To answer your questions in brief: > - Typical clusters run on between 4 and 12x 7200RPM SATA > disks. Some people > run on 10k disks to get more random reads per second, but > not necessary > - latency in transfer times is a matter of what your > application needs, not > a matter of what HBase needs. > - no, we do not asynchronously queue reads - AIO support is > lacking in Java > 6 and even in the current previews of Java7 it is a thin > wrapper around > threadpools and synchronous IO APIs. > - HBases uses log-structured storage, which is somewhat the > same as > write-through caching in a way. We never do random-writes > (in fact they're > impossible in HDFS) > > -Todd > > > > > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, > VF-Group < > > [EMAIL PROTECTED]> > wrote: > > > > > > > > My thinking is around separation of concerns - at > an OU level not just at > > a > > > system integration level. Walrus gives me a > consistent, usable > > abstraction > > > layer to transparently substitute the storage > implementation - for > > example > > > from symmetrix <--> isilon or anything in > between. Walrus is storage > > > subsystem agnostic, so it need not be configured > for inconsistency like > > the > > > Amazon service it emulates. > > > > > > Tight coupling for lock-in is a great commercial > technique often seen > > with > > > suppliers. But it is a bad one. Very bad. > > > > > > > However, reasonably tight coupling between a database > (HBase) and its > > storage layer (HDFS) is IMHO absolutely necessary to > achieve a certain > > level > > of correctness and performance. In HBase's case we use > the Hadoop > > FileSystem > > interface, so in theory it will work on anyone who has > implemented said > > interface, but I wouldn't run a production instance on > anything but HDFS. > > > > It's worth noting that most commercial databases > operate on direct block > > devices rather than on top of filesystems, so that > they don't have to deal > > with varying semantics/performance between > ext3,ext4,xfs,ufs, myriad other > > single-node filesystems that exist. > > > > -Todd > > > > > > > > > > > > > -----Original Message----- > > > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > > > Sent: Thu 5/13/2010 11:54 PM > > > To: [EMAIL PROTECTED] > > > Subject: RE: Using HBase on other file systems > > > > > > You really want to run HBase backed by > Eucalyptus' Walrus? What do you > > have > > > behind that? > > > > > > > From: Gibbon, Robert, VF-Group > > > > Subject: RE: Using HBase on other file > systems > > > [...] > > > > NB. I checked out running HBase over Walrus > (an AWS S3 > > > > clone): bork - you want me to file a Jira on > that? >
-
Re: Using HBase on other file systemsTodd Lipcon 2010-05-15, 22:27
On Sat, May 15, 2010 at 1:19 PM, Gibbon, Robert, VF-Group <
[EMAIL PROTECTED]> wrote: > > Todd thanks for replying. 4x 7200 spindles and no RAID = approx 360 IOPS > to/from the backend storage, minimum and per node to run an HBase cluster. > > If you want to achieve 360 random reads per second per node, then yes :) If you're only doing scans, or you're rarely reading (eg an archival storage system) then you hardly need any random read capacity at all. My laptop may have 4G of RAM, but does that mean that all laptops need 4G to work? Only if you want to put 4G of data in memory! -Todd > -----Original Message----- > From: Todd Lipcon [mailto:[EMAIL PROTECTED]] > Sent: Sat 5/15/2010 3:51 AM > To: [EMAIL PROTECTED] > Subject: Re: Using HBase on other file systems > > On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group < > [EMAIL PROTECTED]> wrote: > > > Hmm. What level of IOPs does Hbase need in order to support a reasonably > > responsive level of service? How much latency in transfer times is > > acceptable before the nodes start to fail? Do you use asynchronous IO > > queueing? Write-through caching? Prefetching? > > > > > Hi Robert. Have you read the Bigtable paper? It's a good description of the > general IO architecture of BigTable. You can also read the original paper > on > Log-structured merge tree storage from back in the 90s. > > To answer your questions in brief: > - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people > run on 10k disks to get more random reads per second, but not necessary > - latency in transfer times is a matter of what your application needs, not > a matter of what HBase needs. > - no, we do not asynchronously queue reads - AIO support is lacking in Java > 6 and even in the current previews of Java7 it is a thin wrapper around > threadpools and synchronous IO APIs. > - HBases uses log-structured storage, which is somewhat the same as > write-through caching in a way. We never do random-writes (in fact they're > impossible in HDFS) > > -Todd > > > > > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > > [EMAIL PROTECTED]> wrote: > > > > > > > > My thinking is around separation of concerns - at an OU level not just > at > > a > > > system integration level. Walrus gives me a consistent, usable > > abstraction > > > layer to transparently substitute the storage implementation - for > > example > > > from symmetrix <--> isilon or anything in between. Walrus is storage > > > subsystem agnostic, so it need not be configured for inconsistency like > > the > > > Amazon service it emulates. > > > > > > Tight coupling for lock-in is a great commercial technique often seen > > with > > > suppliers. But it is a bad one. Very bad. > > > > > > > However, reasonably tight coupling between a database (HBase) and its > > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > > level > > of correctness and performance. In HBase's case we use the Hadoop > > FileSystem > > interface, so in theory it will work on anyone who has implemented said > > interface, but I wouldn't run a production instance on anything but HDFS. > > > > It's worth noting that most commercial databases operate on direct block > > devices rather than on top of filesystems, so that they don't have to > deal > > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad > other > > single-node filesystems that exist. > > > > -Todd > > > > > > > > > > > > > -----Original Message----- > > > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] > > > Sent: Thu 5/13/2010 11:54 PM > > > To: [EMAIL PROTECTED] > > > Subject: RE: Using HBase on other file systems > > > > > > You really want to run HBase backed by Eucalyptus' Walrus? What do you > > have > > > behind that? > > > > > > > From: Gibbon, Robert, VF-Group > > > > Subject: RE: Using HBase on other file systems > > > [...] > > > > NB. I checked out running HBase over Walrus (an AWS S3 Todd Lipcon Software Engineer, Cloudera
-
RE: Using HBase on other file systemsGibbon, Robert, VF-Group 2010-05-16, 08:22
Ok I will read the paper again in more detail. It would be a big help if you published some recommended baseline deployment specs for HBase for typical OLTP and OLAP configurations. Maybe you already did and I missed them. take it easy -----Original Message----- From: Todd Lipcon [mailto:[EMAIL PROTECTED]] Sent: Sun 5/16/2010 12:27 AM To: [EMAIL PROTECTED] Subject: Re: Using HBase on other file systems On Sat, May 15, 2010 at 1:19 PM, Gibbon, Robert, VF-Group < [EMAIL PROTECTED]> wrote: > > Todd thanks for replying. 4x 7200 spindles and no RAID = approx 360 IOPS > to/from the backend storage, minimum and per node to run an HBase cluster. > > If you want to achieve 360 random reads per second per node, then yes :) If you're only doing scans, or you're rarely reading (eg an archival storage system) then you hardly need any random read capacity at all. My laptop may have 4G of RAM, but does that mean that all laptops need 4G to work? Only if you want to put 4G of data in memory! -Todd > -----Original Message----- > From: Todd Lipcon [mailto:[EMAIL PROTECTED]] > Sent: Sat 5/15/2010 3:51 AM > To: [EMAIL PROTECTED] > Subject: Re: Using HBase on other file systems > > On Fri, May 14, 2010 at 2:15 PM, Gibbon, Robert, VF-Group < > [EMAIL PROTECTED]> wrote: > > > Hmm. What level of IOPs does Hbase need in order to support a reasonably > > responsive level of service? How much latency in transfer times is > > acceptable before the nodes start to fail? Do you use asynchronous IO > > queueing? Write-through caching? Prefetching? > > > > > Hi Robert. Have you read the Bigtable paper? It's a good description of the > general IO architecture of BigTable. You can also read the original paper > on > Log-structured merge tree storage from back in the 90s. > > To answer your questions in brief: > - Typical clusters run on between 4 and 12x 7200RPM SATA disks. Some people > run on 10k disks to get more random reads per second, but not necessary > - latency in transfer times is a matter of what your application needs, not > a matter of what HBase needs. > - no, we do not asynchronously queue reads - AIO support is lacking in Java > 6 and even in the current previews of Java7 it is a thin wrapper around > threadpools and synchronous IO APIs. > - HBases uses log-structured storage, which is somewhat the same as > write-through caching in a way. We never do random-writes (in fact they're > impossible in HDFS) > > -Todd > > > > > > On Fri, May 14, 2010 at 12:02 PM, Gibbon, Robert, VF-Group < > > [EMAIL PROTECTED]> wrote: > > > > > > > > My thinking is around separation of concerns - at an OU level not just > at > > a > > > system integration level. Walrus gives me a consistent, usable > > abstraction > > > layer to transparently substitute the storage implementation - for > > example > > > from symmetrix <--> isilon or anything in between. Walrus is storage > > > subsystem agnostic, so it need not be configured for inconsistency like > > the > > > Amazon service it emulates. > > > > > > Tight coupling for lock-in is a great commercial technique often seen > > with > > > suppliers. But it is a bad one. Very bad. > > > > > > > However, reasonably tight coupling between a database (HBase) and its > > storage layer (HDFS) is IMHO absolutely necessary to achieve a certain > > level > > of correctness and performance. In HBase's case we use the Hadoop > > FileSystem > > interface, so in theory it will work on anyone who has implemented said > > interface, but I wouldn't run a production instance on anything but HDFS. > > > > It's worth noting that most commercial databases operate on direct block > > devices rather than on top of filesystems, so that they don't have to > deal > > with varying semantics/performance between ext3,ext4,xfs,ufs, myriad > other > > single-node filesystems that exist. > > > > -Todd > > > > > > > > > > > > > -----Original Message----- > > > From: Andrew Purtell [mailto:[EMAIL PROTECTED]] Todd Lipcon Software Engineer, Cloudera |