|
|
-
HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
D S 2012-03-05, 08:25
Hi,
I'm learning more about HBase and I'm curious how much of HBase is actually based on Google's original dB. In Google's origins stories, they are well known for using low cost commodity hardware in scale in order to store their web database.
Almost every blog I read about HBase tells me it's a clone of BigTable. Almost every blog I've read about HBase also tells me to use a lot of RAM - gigabytes worth. Some even tell me not to even consider HBase with less than 4GB of RAM.
If I remember my history correctly, a commodity machine in the year 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. From everything I've read, running HBase on such machines is a very bad idea yet this was the machines readily available in the year 2003 when Google started it's growth.
I'm confused at the moment. Can someone give me a bit of background about how HBase performance is handled from the "low" end which was considered "high" end back then? Should I assume that HBase is just a clone of BigTable? What is HBase's history? Are the blogs wrong?
Thanks for any clarification anyone can give.
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Doug Meil 2012-03-05, 13:21
re: "Almost every blog I read about HBase tells me it's a clone of BigTable." The HBase website says that too.... http://hbase.apache.org/re: "Almost every blog I've read about HBase also tells me to use a lot of RAM" So does the Hbase Reference Guide... http://hbase.apache.org/book.html#perf.osFor more information, see... http://hbase.apache.org/book.html#other.infoOn 3/5/12 3:25 AM, "D S" <[EMAIL PROTECTED]> wrote: >Hi, > >I'm learning more about HBase and I'm curious how much of HBase is >actually based on Google's original dB. In Google's origins stories, >they are well known for using low cost commodity hardware in scale in >order to store their web database. > >Almost every blog I read about HBase tells me it's a clone of >BigTable. Almost every blog I've read about HBase also tells me to >use a lot of RAM - gigabytes worth. Some even tell me not to even >consider HBase with less than 4GB of RAM. > >If I remember my history correctly, a commodity machine in the year >2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. > From everything I've read, running HBase on such machines is a very >bad idea yet this was the machines readily available in the year 2003 >when Google started it's growth. > >I'm confused at the moment. Can someone give me a bit of background >about how HBase performance is handled from the "low" end which was >considered "high" end back then? Should I assume that HBase is just a >clone of BigTable? What is HBase's history? Are the blogs wrong? > >Thanks for any clarification anyone can give. >
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Michael Drzal 2012-03-05, 14:45
You really need to consider the entire historical context here. A lot of the memory used in hbase is buffering writes to disk and for the block cache. These days, it isn't unreasonable to get 12 2-3TB disks in a commodity server. Back in 2003, you would not get as many disks, and they would be much smaller. One way to think about it is the ratio of RAM/disk space or more operationally what your cache hit ratio is and how busy your disk drives are.
Drz
On Mon, Mar 5, 2012 at 3:25 AM, D S <[EMAIL PROTECTED]> wrote:
> Hi, > > I'm learning more about HBase and I'm curious how much of HBase is > actually based on Google's original dB. In Google's origins stories, > they are well known for using low cost commodity hardware in scale in > order to store their web database. > > Almost every blog I read about HBase tells me it's a clone of > BigTable. Almost every blog I've read about HBase also tells me to > use a lot of RAM - gigabytes worth. Some even tell me not to even > consider HBase with less than 4GB of RAM. > > If I remember my history correctly, a commodity machine in the year > 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. > From everything I've read, running HBase on such machines is a very > bad idea yet this was the machines readily available in the year 2003 > when Google started it's growth. > > I'm confused at the moment. Can someone give me a bit of background > about how HBase performance is handled from the "low" end which was > considered "high" end back then? Should I assume that HBase is just a > clone of BigTable? What is HBase's history? Are the blogs wrong? > > Thanks for any clarification anyone can give. >
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
D S 2012-03-05, 19:39
On 3/5/12, Michael Drzal <[EMAIL PROTECTED]> wrote: > You really need to consider the entire historical context here. A lot of > the memory used in hbase is buffering writes to disk and for the block > cache. These days, it isn't unreasonable to get 12 2-3TB disks in a > commodity server. Back in 2003, you would not get as many disks, and they > would be much smaller. One way to think about it is the ratio of RAM/disk > space or more operationally what your cache hit ratio is and how busy your > disk drives are. > > Drz > > On Mon, Mar 5, 2012 at 3:25 AM, D S <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I'm learning more about HBase and I'm curious how much of HBase is >> actually based on Google's original dB. In Google's origins stories, >> they are well known for using low cost commodity hardware in scale in >> order to store their web database. >> >> Almost every blog I read about HBase tells me it's a clone of >> BigTable. Almost every blog I've read about HBase also tells me to >> use a lot of RAM - gigabytes worth. Some even tell me not to even >> consider HBase with less than 4GB of RAM. >> >> If I remember my history correctly, a commodity machine in the year >> 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. >> From everything I've read, running HBase on such machines is a very >> bad idea yet this was the machines readily available in the year 2003 >> when Google started it's growth. >> >> I'm confused at the moment. Can someone give me a bit of background >> about how HBase performance is handled from the "low" end which was >> considered "high" end back then? Should I assume that HBase is just a >> clone of BigTable? What is HBase's history? Are the blogs wrong? >> >> Thanks for any clarification anyone can give. >> >
Is HBase's configuration options robust enough that it could go back and run well on those 2003 specs by a bit of tweaking if that what was desired?
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Alan Chaney 2012-03-05, 20:23
On 3/5/2012 11:39 AM, D S wrote: > On 3/5/12, Michael Drzal<[EMAIL PROTECTED]> wrote: >> Y > Is HBase's configuration options robust enough that it could go back > and run well on those 2003 specs by a bit of tweaking if that what was > desired?
What do you mean "run well"? Run as well as Big Table would have done on the same machines? (Probably only someone who worked on B/T would be in a position to comment on that). Run without crashing? Run at XXX I/O operations per second?
Since 2003, roughly speaking at the same price point for a "commodity":
network I/O has increased by a factor of 10 - 100Mps was typical in such a m/c, now 1G is typical and 10G available. disk I/O has increased by about 5 to 10 (3G SATA vs ATA-100, faster rotation and seek times) disk price per GB has dropped by about a factor of 10 RAM performance has increased by a factor of somewhere between 5 and 10 CPU performance has increased for a typical "commodity" m/c from say 1GHz single core to 2.5 to 3 G Quad or 8 core, so say 20-30x overall.
Add to that a lot of people on this list use virtualized instances and the equations get even more complicated and confusing.
Whats you point? Do you want to know how to set up a minimal HBase node which works on a 512M m/c? Purely for testing purposes I've run a V/M with only 750MB of RAM and it worked, but I wasn't pushing very much data through it.
Alan
-
RE: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Sandy Pratt 2012-03-05, 21:50
I have HBase instances with 2GB heap that perform ok. I'm sure they would perform better with more RAM, but they are definitely good enough to test queries and so forth. I bet you could probably get down to 1.5 or 1 GB and be stable if you wanted to.
> -----Original Message----- > From: D S [mailto:[EMAIL PROTECTED]] > Sent: Monday, March 05, 2012 11:40 > To: [EMAIL PROTECTED] > Subject: Re: HBase & BigTable + History: Can it run decently on a 512MB > machine? What's the difference between the two? > > On 3/5/12, Michael Drzal <[EMAIL PROTECTED]> wrote: > > You really need to consider the entire historical context here. A lot > > of the memory used in hbase is buffering writes to disk and for the > > block cache. These days, it isn't unreasonable to get 12 2-3TB disks > > in a commodity server. Back in 2003, you would not get as many disks, > > and they would be much smaller. One way to think about it is the > > ratio of RAM/disk space or more operationally what your cache hit > > ratio is and how busy your disk drives are. > > > > Drz > > > > On Mon, Mar 5, 2012 at 3:25 AM, D S <[EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> I'm learning more about HBase and I'm curious how much of HBase is > >> actually based on Google's original dB. In Google's origins stories, > >> they are well known for using low cost commodity hardware in scale in > >> order to store their web database. > >> > >> Almost every blog I read about HBase tells me it's a clone of > >> BigTable. Almost every blog I've read about HBase also tells me to > >> use a lot of RAM - gigabytes worth. Some even tell me not to even > >> consider HBase with less than 4GB of RAM. > >> > >> If I remember my history correctly, a commodity machine in the year > >> 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. > >> From everything I've read, running HBase on such machines is a very > >> bad idea yet this was the machines readily available in the year 2003 > >> when Google started it's growth. > >> > >> I'm confused at the moment. Can someone give me a bit of background > >> about how HBase performance is handled from the "low" end which was > >> considered "high" end back then? Should I assume that HBase is just > >> a clone of BigTable? What is HBase's history? Are the blogs wrong? > >> > >> Thanks for any clarification anyone can give. > >> > > > > Is HBase's configuration options robust enough that it could go back and run > well on those 2003 specs by a bit of tweaking if that what was desired?
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
lars hofhansl 2012-03-05, 22:12
This is a hypothetical question. Why do you care? Can you run current Windows on '03 machines? Or Linux (with KDE/Gnome)?
HBase is designed for modern machines.
________________________________ From: D S <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Monday, March 5, 2012 11:39 AM Subject: Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two? On 3/5/12, Michael Drzal <[EMAIL PROTECTED]> wrote: > You really need to consider the entire historical context here. A lot of > the memory used in hbase is buffering writes to disk and for the block > cache. These days, it isn't unreasonable to get 12 2-3TB disks in a > commodity server. Back in 2003, you would not get as many disks, and they > would be much smaller. One way to think about it is the ratio of RAM/disk > space or more operationally what your cache hit ratio is and how busy your > disk drives are. > > Drz > > On Mon, Mar 5, 2012 at 3:25 AM, D S <[EMAIL PROTECTED]> wrote: > >> Hi, >> >> I'm learning more about HBase and I'm curious how much of HBase is >> actually based on Google's original dB. In Google's origins stories, >> they are well known for using low cost commodity hardware in scale in >> order to store their web database. >> >> Almost every blog I read about HBase tells me it's a clone of >> BigTable. Almost every blog I've read about HBase also tells me to >> use a lot of RAM - gigabytes worth. Some even tell me not to even >> consider HBase with less than 4GB of RAM. >> >> If I remember my history correctly, a commodity machine in the year >> 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. >> From everything I've read, running HBase on such machines is a very >> bad idea yet this was the machines readily available in the year 2003 >> when Google started it's growth. >> >> I'm confused at the moment. Can someone give me a bit of background >> about how HBase performance is handled from the "low" end which was >> considered "high" end back then? Should I assume that HBase is just a >> clone of BigTable? What is HBase's history? Are the blogs wrong? >> >> Thanks for any clarification anyone can give. >> >
Is HBase's configuration options robust enough that it could go back and run well on those 2003 specs by a bit of tweaking if that what was desired?
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
D S 2012-03-06, 00:28
Simple, I want to see what is meant by the claim that HBase = Big Table. How far does this claim go?
How identical are the two products? Does it stop at the fronted specifications? Does it go into the internals? I just want to know how identical these two products are and how different are the two.
If I took the current build of HBase and had a time machine and installed it in all those circa 2003 Google servers (and not one server more), would I end up with something similar to what Google had back then?
Is there anyone in this mailing list who has any experience w/ BigTable (older versions)?
On Mon, Mar 5, 2012 at 5:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote:
> This is a hypothetical question. Why do you care? > Can you run current Windows on '03 machines? Or Linux (with KDE/Gnome)? > > HBase is designed for modern machines. > > > > ________________________________ > From: D S <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Monday, March 5, 2012 11:39 AM > Subject: Re: HBase & BigTable + History: Can it run decently on a 512MB > machine? What's the difference between the two? > > On 3/5/12, Michael Drzal <[EMAIL PROTECTED]> wrote: > > You really need to consider the entire historical context here. A lot of > > the memory used in hbase is buffering writes to disk and for the block > > cache. These days, it isn't unreasonable to get 12 2-3TB disks in a > > commodity server. Back in 2003, you would not get as many disks, and > they > > would be much smaller. One way to think about it is the ratio of > RAM/disk > > space or more operationally what your cache hit ratio is and how busy > your > > disk drives are. > > > > Drz > > > > On Mon, Mar 5, 2012 at 3:25 AM, D S <[EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> I'm learning more about HBase and I'm curious how much of HBase is > >> actually based on Google's original dB. In Google's origins stories, > >> they are well known for using low cost commodity hardware in scale in > >> order to store their web database. > >> > >> Almost every blog I read about HBase tells me it's a clone of > >> BigTable. Almost every blog I've read about HBase also tells me to > >> use a lot of RAM - gigabytes worth. Some even tell me not to even > >> consider HBase with less than 4GB of RAM. > >> > >> If I remember my history correctly, a commodity machine in the year > >> 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. > >> From everything I've read, running HBase on such machines is a very > >> bad idea yet this was the machines readily available in the year 2003 > >> when Google started it's growth. > >> > >> I'm confused at the moment. Can someone give me a bit of background > >> about how HBase performance is handled from the "low" end which was > >> considered "high" end back then? Should I assume that HBase is just a > >> clone of BigTable? What is HBase's history? Are the blogs wrong? > >> > >> Thanks for any clarification anyone can give. > >> > > > > Is HBase's configuration options robust enough that it could go back > and run well on those 2003 specs by a bit of tweaking if that what was > desired? >
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Ian Varley 2012-03-06, 00:48
DS,
HBase is an open source project, so you can read the source code and make that determination for yourself. It was first created based on the same ideas in the Bigtable paper (published by Google) but is only related based on the design goals and philosophy, not the actual implementation.
BigTable, conversely, is a proprietary system design and run by Google. They don't share the source code, nor license it outside of Google in any way. So if you want an actual comparison, you'll have to go work at Google. :)
I don't think there's anyone claiming that HBase = Bigtable; simply that it's based on the same ideas, and is intended as an open source implementation of the same concept.
Ian
On Mar 5, 2012, at 6:28 PM, D S wrote:
Simple, I want to see what is meant by the claim that HBase = Big Table. How far does this claim go?
How identical are the two products? Does it stop at the fronted specifications? Does it go into the internals? I just want to know how identical these two products are and how different are the two.
If I took the current build of HBase and had a time machine and installed it in all those circa 2003 Google servers (and not one server more), would I end up with something similar to what Google had back then?
Is there anyone in this mailing list who has any experience w/ BigTable (older versions)?
On Mon, Mar 5, 2012 at 5:12 PM, lars hofhansl <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
This is a hypothetical question. Why do you care? Can you run current Windows on '03 machines? Or Linux (with KDE/Gnome)?
HBase is designed for modern machines.
________________________________ From: D S <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> To: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> Sent: Monday, March 5, 2012 11:39 AM Subject: Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
On 3/5/12, Michael Drzal <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote: You really need to consider the entire historical context here. A lot of the memory used in hbase is buffering writes to disk and for the block cache. These days, it isn't unreasonable to get 12 2-3TB disks in a commodity server. Back in 2003, you would not get as many disks, and they would be much smaller. One way to think about it is the ratio of RAM/disk space or more operationally what your cache hit ratio is and how busy your disk drives are.
Drz
On Mon, Mar 5, 2012 at 3:25 AM, D S <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
Hi,
I'm learning more about HBase and I'm curious how much of HBase is actually based on Google's original dB. In Google's origins stories, they are well known for using low cost commodity hardware in scale in order to store their web database.
Almost every blog I read about HBase tells me it's a clone of BigTable. Almost every blog I've read about HBase also tells me to use a lot of RAM - gigabytes worth. Some even tell me not to even consider HBase with less than 4GB of RAM.
If I remember my history correctly, a commodity machine in the year 2003 had around 512MB to 1GB of RAM in it. The fancier ones had, 2GB. >From everything I've read, running HBase on such machines is a very bad idea yet this was the machines readily available in the year 2003 when Google started it's growth.
I'm confused at the moment. Can someone give me a bit of background about how HBase performance is handled from the "low" end which was considered "high" end back then? Should I assume that HBase is just a clone of BigTable? What is HBase's history? Are the blogs wrong?
Thanks for any clarification anyone can give.
Is HBase's configuration options robust enough that it could go back and run well on those 2003 specs by a bit of tweaking if that what was desired?
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Andrew Purtell 2012-03-06, 01:09
> On Mar 5, 2012, at 6:28 PM, D S wrote: > Simple, I want to see what is meant by the claim that HBase = Big Table. > How far does this claim go?
Who is making this claim?
I think we say that HBase is a BigTable clone, because it attempts to be faithful to the BigTable architecture as described in Google's BigTable paper of 2004.
> How identical are the two products? Does it stop at the fronted > specifications?
An architectural comparison would be valid. Pull the Google BigTable paper and then grab a copy of Lars George's HBase book or read the source and/or ask questions. > Does it go into the internals? I just want to know how identical > these two products are and how different are the two.
> If I took the current build of HBase and had a time machine and installed > it in all those circa 2003 Google servers (and not one server more), would > I end up with something similar to what Google had back then?
Google's BigTable is closed source, so who can say.
Best regards, - Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: HBase & BigTable + History: Can it run decently on a 512MB machine? What's the difference between the two?
Doug Meil 2012-03-06, 02:13
To stress what Andrew said, the HBase homepage says: "HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data < http://research.google.com/archive/bigtable.html> by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS." Note the phrases "modeled after" and "Bigtable-like", not "is 100% exactly the same as." On 3/5/12 8:09 PM, "Andrew Purtell" <[EMAIL PROTECTED]> wrote: >> On Mar 5, 2012, at 6:28 PM, D S wrote: >> Simple, I want to see what is meant by the claim that HBase = Big Table. >> How far does this claim go? > >Who is making this claim? > >I think we say that HBase is a BigTable clone, because it attempts to be >faithful to the BigTable architecture as described in Google's BigTable >paper of 2004. > >> How identical are the two products? Does it stop at the fronted >> specifications? > >An architectural comparison would be valid. Pull the Google BigTable >paper and then grab a copy of Lars George's HBase book or read the source >and/or ask questions. > > >> Does it go into the internals? I just want to know how identical >> these two products are and how different are the two. > >> If I took the current build of HBase and had a time machine and >>installed >> it in all those circa 2003 Google servers (and not one server more), >>would >> I end up with something similar to what Google had back then? > > >Google's BigTable is closed source, so who can say. > >Best regards, > > > - Andy > >Problems worthy of attack prove their worth by hitting back. - Piet Hein >(via Tom White)
|
|