|
|
-
hbase as a primary store, or is it more for "2nd class" data?
S Ahmed 2012-05-14, 00:14
I'm interested to learn if people are using hbase as a primary store or is it more for "2nd class" type data.
Pretend you have a CMS product, or eCommerce Saas application:
What I mean by this is, I consider "primary store" to mean storing the actual content (say articles, or blog posts), category data, user information, or shopping cart order, product information.
"2nd class" type data is data like metrics, analytics, log data, or say index data (data that can be re-built via the primary store).
In general 2nd class data is data that if lost, it won't bring the business to its knees.
What do you guys think, am I right?
i.e. if you are creating a Saas product, it wouldn't be advisible to build it using hbase (or it will be kind of bleeding edge architecture).
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Andrew Purtell 2012-05-14, 00:53
We use HBase as a primary store for some data.
The best answer I think you can expect to your line of questioning is "it depends."
It is fair to say a SaaS product built on HBase (or any so-called "NoSQL" store) would be bleeding edge and advisable only if you really know what you are doing and more conventional alternatives are not going to work; or, it's the second iteration of the service and the initial implementation simply isn't going to scale like you need for it or want it too.
Best regards,
- Andy
On Sunday, May 13, 2012, S Ahmed wrote:
> I'm interested to learn if people are using hbase as a primary store or is > it more for "2nd class" type data. > > Pretend you have a CMS product, or eCommerce Saas application: > > What I mean by this is, I consider "primary store" to mean storing the > actual content (say articles, or blog posts), category data, user > information, or shopping cart order, product information. > > "2nd class" type data is data like metrics, analytics, log data, or say > index data (data that can be re-built via the primary store). > > In general 2nd class data is data that if lost, it won't bring the business > to its knees. > > What do you guys think, am I right? > > i.e. if you are creating a Saas product, it wouldn't be advisible to build > it using hbase (or it will be kind of bleeding edge architecture). > -- Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Otis Gospodnetic 2012-05-14, 02:00
Hi Ahmed, At Sematext we have a few SaaS products that use HBase as the primary data store. I hear Facebook uses HBase for some important stuff, too. ;) So far we've survived. HBase does have rough edges, but also good developers who are making it better every day. Otis ---- Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm >________________________________ > From: S Ahmed <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Sunday, May 13, 2012 8:14 PM >Subject: hbase as a primary store, or is it more for "2nd class" data? > >I'm interested to learn if people are using hbase as a primary store or is >it more for "2nd class" type data. > >Pretend you have a CMS product, or eCommerce Saas application: > >What I mean by this is, I consider "primary store" to mean storing the >actual content (say articles, or blog posts), category data, user >information, or shopping cart order, product information. > >"2nd class" type data is data like metrics, analytics, log data, or say >index data (data that can be re-built via the primary store). > >In general 2nd class data is data that if lost, it won't bring the business >to its knees. > >What do you guys think, am I right? > >i.e. if you are creating a Saas product, it wouldn't be advisible to build >it using hbase (or it will be kind of bleeding edge architecture). > > >
-
Re: hbase as a primary store, or is it more for "2nd class" data?
S Ahmed 2012-05-14, 02:21
Otis, It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi Ahmed, > > At Sematext we have a few SaaS products that use HBase as the primary data > store. I hear Facebook uses HBase for some important stuff, too. ;) > So far we've survived. HBase does have rough edges, but also good > developers who are making it better every day. > > Otis > ---- > Performance Monitoring for Solr / ElasticSearch / HBase - > http://sematext.com/spm> > > > >________________________________ > > From: S Ahmed <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED] > >Sent: Sunday, May 13, 2012 8:14 PM > >Subject: hbase as a primary store, or is it more for "2nd class" data? > > > >I'm interested to learn if people are using hbase as a primary store or is > >it more for "2nd class" type data. > > > >Pretend you have a CMS product, or eCommerce Saas application: > > > >What I mean by this is, I consider "primary store" to mean storing the > >actual content (say articles, or blog posts), category data, user > >information, or shopping cart order, product information. > > > >"2nd class" type data is data like metrics, analytics, log data, or say > >index data (data that can be re-built via the primary store). > > > >In general 2nd class data is data that if lost, it won't bring the > business > >to its knees. > > > >What do you guys think, am I right? > > > >i.e. if you are creating a Saas product, it wouldn't be advisible to build > >it using hbase (or it will be kind of bleeding edge architecture). > > > > > > >
-
RE: hbase as a primary store, or is it more for "2nd class" data?
Srikanth P. Shreenivas 2012-05-14, 04:21
There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. Regards, Srikanth -----Original Message----- From: S Ahmed [mailto:[EMAIL PROTECTED]] Sent: Monday, May 14, 2012 7:52 AM To: [EMAIL PROTECTED]; Otis Gospodnetic Subject: Re: hbase as a primary store, or is it more for "2nd class" data? Otis, It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > Hi Ahmed, > > At Sematext we have a few SaaS products that use HBase as the primary > data store. I hear Facebook uses HBase for some important stuff, too. > ;) So far we've survived. HBase does have rough edges, but also good > developers who are making it better every day. > > Otis > ---- > Performance Monitoring for Solr / ElasticSearch / HBase - > http://sematext.com/spm> > > > >________________________________ > > From: S Ahmed <[EMAIL PROTECTED]> > >To: [EMAIL PROTECTED] > >Sent: Sunday, May 13, 2012 8:14 PM > >Subject: hbase as a primary store, or is it more for "2nd class" data? > > > >I'm interested to learn if people are using hbase as a primary store > >or is it more for "2nd class" type data. > > > >Pretend you have a CMS product, or eCommerce Saas application: > > > >What I mean by this is, I consider "primary store" to mean storing > >the actual content (say articles, or blog posts), category data, user > >information, or shopping cart order, product information. > > > >"2nd class" type data is data like metrics, analytics, log data, or > >say index data (data that can be re-built via the primary store). > > > >In general 2nd class data is data that if lost, it won't bring the > business > >to its knees. > > > >What do you guys think, am I right? > > > >i.e. if you are creating a Saas product, it wouldn't be advisible to > >build it using hbase (or it will be kind of bleeding edge architecture). > > > > > > > ________________________________ http://www.mindtree.com/email/disclaimer.html
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Andrew Purtell 2012-05-14, 07:51
Any data store may lose data, as a generic statement, so maybe you had something more specific in mind? On May 13, 2012, at 9:21 PM, "Srikanth P. Shreenivas" <[EMAIL PROTECTED]> wrote: > There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. > If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. > > Regards, > Srikanth > > -----Original Message----- > From: S Ahmed [mailto:[EMAIL PROTECTED]] > Sent: Monday, May 14, 2012 7:52 AM > To: [EMAIL PROTECTED]; Otis Gospodnetic > Subject: Re: hbase as a primary store, or is it more for "2nd class" data? > > Otis, > > It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) > > I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. > > (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) > > On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > >> Hi Ahmed, >> >> At Sematext we have a few SaaS products that use HBase as the primary >> data store. I hear Facebook uses HBase for some important stuff, too. >> ;) So far we've survived. HBase does have rough edges, but also good >> developers who are making it better every day. >> >> Otis >> ---- >> Performance Monitoring for Solr / ElasticSearch / HBase - >> http://sematext.com/spm>> >> >> >>> ________________________________ >>> From: S Ahmed <[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Sunday, May 13, 2012 8:14 PM >>> Subject: hbase as a primary store, or is it more for "2nd class" data? >>> >>> I'm interested to learn if people are using hbase as a primary store >>> or is it more for "2nd class" type data. >>> >>> Pretend you have a CMS product, or eCommerce Saas application: >>> >>> What I mean by this is, I consider "primary store" to mean storing >>> the actual content (say articles, or blog posts), category data, user >>> information, or shopping cart order, product information. >>> >>> "2nd class" type data is data like metrics, analytics, log data, or >>> say index data (data that can be re-built via the primary store). >>> >>> In general 2nd class data is data that if lost, it won't bring the >> business >>> to its knees. >>> >>> What do you guys think, am I right? >>> >>> i.e. if you are creating a Saas product, it wouldn't be advisible to >>> build it using hbase (or it will be kind of bleeding edge architecture). >>> >>> >>> >> > > ________________________________ > > http://www.mindtree.com/email/disclaimer.html
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Ian Varley 2012-05-14, 14:48
Ahmed, Generally speaking, the intent of HBase IS to be a first class data store. It's a young data store (not even 1.0) so you have to take that into account; but there's been a lot of engineering put into making it fully safe, and known data safety issues are considered release blockers. (This is assuming you run with a WAL enabled, have at least 3 replicas in HDFS, etc -- follow good data safety practices.) The data loss scenarios I've heard of are mostly of the "byzantine" variety. For example, if you have an entire data center power outage, you may lose a few seconds of data that had been synced in the WAL but not fsynced (i.e. flushed by the OS to magnetic media). There are also various known bugs involving multiple failure scenarios where it could lose data (for example, if you have multiple successive node failures during replication). To my knowledge, there are no known "simple" cases where HBase will lose data. For that matter, relational DBs can lose data too (I've seen it happen, recently, because of a HW failure). So ultimately, it comes down to how valuable the data is to you, and how many redundant measures you're willing to take to prevent increasingly rare situations. You accounting for earthquakes? Solar flares? :) Ian On May 13, 2012, at 11:21 PM, Srikanth P. Shreenivas wrote: > There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. > If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. > > Regards, > Srikanth > > -----Original Message----- > From: S Ahmed [mailto:[EMAIL PROTECTED]] > Sent: Monday, May 14, 2012 7:52 AM > To: [EMAIL PROTECTED]; Otis Gospodnetic > Subject: Re: hbase as a primary store, or is it more for "2nd class" data? > > Otis, > > It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) > > I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. > > (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) > > On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > >> Hi Ahmed, >> >> At Sematext we have a few SaaS products that use HBase as the primary >> data store. I hear Facebook uses HBase for some important stuff, too. >> ;) So far we've survived. HBase does have rough edges, but also good >> developers who are making it better every day. >> >> Otis >> ---- >> Performance Monitoring for Solr / ElasticSearch / HBase - >> http://sematext.com/spm>> >> >> >>> ________________________________ >>> From: S Ahmed <[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Sunday, May 13, 2012 8:14 PM >>> Subject: hbase as a primary store, or is it more for "2nd class" data? >>> >>> I'm interested to learn if people are using hbase as a primary store >>> or is it more for "2nd class" type data. >>> >>> Pretend you have a CMS product, or eCommerce Saas application: >>> >>> What I mean by this is, I consider "primary store" to mean storing >>> the actual content (say articles, or blog posts), category data, user >>> information, or shopping cart order, product information. >>> >>> "2nd class" type data is data like metrics, analytics, log data, or >>> say index data (data that can be re-built via the primary store). >>> >>> In general 2nd class data is data that if lost, it won't bring the >> business >>> to its knees. >>> >>> What do you guys think, am I right? >>> >>> i.e. if you are creating a Saas product, it wouldn't be advisible to >>> build it using hbase (or it will be kind of bleeding edge architecture). >>> >>> >>> >>
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Srikanth P. Shreenivas 2012-05-14, 19:35
Yes, agreed that data can be lost in any DB. However, isnt it more frequently seen in NoSql DBs. In case of Hbase, Is it not possible for underlying HDFS to lose data if nodes went down abrubtly few times. Andrew Purtell <[EMAIL PROTECTED]> wrote: Any data store may lose data, as a generic statement, so maybe you had something more specific in mind? On May 13, 2012, at 9:21 PM, "Srikanth P. Shreenivas" <[EMAIL PROTECTED]> wrote: > There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. > If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. > > Regards, > Srikanth > > -----Original Message----- > From: S Ahmed [mailto:[EMAIL PROTECTED]] > Sent: Monday, May 14, 2012 7:52 AM > To: [EMAIL PROTECTED]; Otis Gospodnetic > Subject: Re: hbase as a primary store, or is it more for "2nd class" data? > > Otis, > > It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) > > I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. > > (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) > > On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: > >> Hi Ahmed, >> >> At Sematext we have a few SaaS products that use HBase as the primary >> data store. I hear Facebook uses HBase for some important stuff, too. >> ;) So far we've survived. HBase does have rough edges, but also good >> developers who are making it better every day. >> >> Otis >> ---- >> Performance Monitoring for Solr / ElasticSearch / HBase - >> http://sematext.com/spm>> >> >> >>> ________________________________ >>> From: S Ahmed <[EMAIL PROTECTED]> >>> To: [EMAIL PROTECTED] >>> Sent: Sunday, May 13, 2012 8:14 PM >>> Subject: hbase as a primary store, or is it more for "2nd class" data? >>> >>> I'm interested to learn if people are using hbase as a primary store >>> or is it more for "2nd class" type data. >>> >>> Pretend you have a CMS product, or eCommerce Saas application: >>> >>> What I mean by this is, I consider "primary store" to mean storing >>> the actual content (say articles, or blog posts), category data, user >>> information, or shopping cart order, product information. >>> >>> "2nd class" type data is data like metrics, analytics, log data, or >>> say index data (data that can be re-built via the primary store). >>> >>> In general 2nd class data is data that if lost, it won't bring the >> business >>> to its knees. >>> >>> What do you guys think, am I right? >>> >>> i.e. if you are creating a Saas product, it wouldn't be advisible to >>> build it using hbase (or it will be kind of bleeding edge architecture). >>> >>> >>> >> > > ________________________________ > > http://www.mindtree.com/email/disclaimer.html
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Amandeep Khurana 2012-05-14, 19:41
HDFS is designed to not lose data if a few nodes fail. It holds multiple replicas of each block. Having said that - it also depends on the definition of "a few". Many companies are using HDFS as their central data store and it's proven at scale in production. It does not lose data arbitrarily, and neither does HBase. Have you come across a case where you experience data loss with either HDFS or HBase? We'd be curious to learn about it. On Monday, May 14, 2012 at 12:35 PM, Srikanth P. Shreenivas wrote: > Yes, agreed that data can be lost in any DB. However, isnt it more frequently seen in NoSql DBs. In case of Hbase, Is it not possible for underlying HDFS to lose data if nodes went down abrubtly few times. > > > Andrew Purtell <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > Any data store may lose data, as a generic statement, so maybe you had something more specific in mind? > > On May 13, 2012, at 9:21 PM, "Srikanth P. Shreenivas" <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. > > If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. > > > > Regards, > > Srikanth > > > > -----Original Message----- > > From: S Ahmed [mailto:[EMAIL PROTECTED]] > > Sent: Monday, May 14, 2012 7:52 AM > > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]); Otis Gospodnetic > > Subject: Re: hbase as a primary store, or is it more for "2nd class" data? > > > > Otis, > > > > It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) > > > > I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. > > > > (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) > > > > On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > Hi Ahmed, > > > > > > At Sematext we have a few SaaS products that use HBase as the primary > > > data store. I hear Facebook uses HBase for some important stuff, too. > > > ;) So far we've survived. HBase does have rough edges, but also good > > > developers who are making it better every day. > > > > > > Otis > > > ---- > > > Performance Monitoring for Solr / ElasticSearch / HBase - > > > http://sematext.com/spm> > > > > > > > > > > > > ________________________________ > > > > From: S Ahmed <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> > > > > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > > > > Sent: Sunday, May 13, 2012 8:14 PM > > > > Subject: hbase as a primary store, or is it more for "2nd class" data? > > > > > > > > I'm interested to learn if people are using hbase as a primary store > > > > or is it more for "2nd class" type data. > > > > > > > > Pretend you have a CMS product, or eCommerce Saas application: > > > > > > > > What I mean by this is, I consider "primary store" to mean storing > > > > the actual content (say articles, or blog posts), category data, user > > > > information, or shopping cart order, product information. > > > > > > > > "2nd class" type data is data like metrics, analytics, log data, or > > > > say index data (data that can be re-built via the primary store). > > > > > > > > In general 2nd class data is data that if lost, it won't bring the > > > business > > > > to its knees. > > > > > > > > What do you guys think, am I right? > > > > > > > > i.e. if you are creating a Saas product, it wouldn't be advisible to > > > > build it using hbase (or it will be kind of bleeding edge architecture).
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Andrew Purtell 2012-05-14, 19:42
You don't really help anyone evaluate their options if you just throw out nonspecific statements like "There is a possibility that you may lose data" ... of course there is, with anything. Do you have first or secondhand knowledge of some specific incident where someone lost data using HBase? Not a challenge, rather, looking for something constructive here. - Andy On Mon, May 14, 2012 at 12:35 PM, Srikanth P. Shreenivas <[EMAIL PROTECTED]> wrote: > Yes, agreed that data can be lost in any DB. However, isnt it more frequently seen in NoSql DBs. In case of Hbase, Is it not possible for underlying HDFS to lose data if nodes went down abrubtly few times. > > > Andrew Purtell <[EMAIL PROTECTED]> wrote: > > Any data store may lose data, as a generic statement, so maybe you had something more specific in mind? > > On May 13, 2012, at 9:21 PM, "Srikanth P. Shreenivas" <[EMAIL PROTECTED]> wrote: > >> There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. >> If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. >> >> Regards, >> Srikanth >> >> -----Original Message----- >> From: S Ahmed [mailto:[EMAIL PROTECTED]] >> Sent: Monday, May 14, 2012 7:52 AM >> To: [EMAIL PROTECTED]; Otis Gospodnetic >> Subject: Re: hbase as a primary store, or is it more for "2nd class" data? >> >> Otis, >> >> It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) >> >> I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. >> >> (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) >> >> On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED]> wrote: >> >>> Hi Ahmed, >>> >>> At Sematext we have a few SaaS products that use HBase as the primary >>> data store. I hear Facebook uses HBase for some important stuff, too. >>> ;) So far we've survived. HBase does have rough edges, but also good >>> developers who are making it better every day. >>> >>> Otis >>> ---- >>> Performance Monitoring for Solr / ElasticSearch / HBase - >>> http://sematext.com/spm>>> >>> >>> >>>> ________________________________ >>>> From: S Ahmed <[EMAIL PROTECTED]> >>>> To: [EMAIL PROTECTED] >>>> Sent: Sunday, May 13, 2012 8:14 PM >>>> Subject: hbase as a primary store, or is it more for "2nd class" data? >>>> >>>> I'm interested to learn if people are using hbase as a primary store >>>> or is it more for "2nd class" type data. >>>> >>>> Pretend you have a CMS product, or eCommerce Saas application: >>>> >>>> What I mean by this is, I consider "primary store" to mean storing >>>> the actual content (say articles, or blog posts), category data, user >>>> information, or shopping cart order, product information. >>>> >>>> "2nd class" type data is data like metrics, analytics, log data, or >>>> say index data (data that can be re-built via the primary store). >>>> >>>> In general 2nd class data is data that if lost, it won't bring the >>> business >>>> to its knees. >>>> >>>> What do you guys think, am I right? >>>> >>>> i.e. if you are creating a Saas product, it wouldn't be advisible to >>>> build it using hbase (or it will be kind of bleeding edge architecture).
-
Re: hbase as a primary store, or is it more for "2nd class" data?
Amandeep Khurana 2012-05-14, 19:47
Ahmed, I'll second what Ian and Andrew have highlighted. HBase is very capable of being used as a primary store as long as you run it following the best practices. It's a useful exercise to clearly define the failure scenarios you want to safeguard against and what kind of SLAs you have in terms of recovering from those. That will help you evaluate whether HBase has the features and stability required to fulfill your requirements. It's hard to make a general statement about whether HBase makes sense as a primary store or not. In many cases it does and and in many cases it does not, and the decision depends on what your requirements are. Hope that helps. -Amandeep On Monday, May 14, 2012 at 7:48 AM, Ian Varley wrote: > Ahmed, > > Generally speaking, the intent of HBase IS to be a first class data store. It's a young data store (not even 1.0) so you have to take that into account; but there's been a lot of engineering put into making it fully safe, and known data safety issues are considered release blockers. (This is assuming you run with a WAL enabled, have at least 3 replicas in HDFS, etc -- follow good data safety practices.) > > The data loss scenarios I've heard of are mostly of the "byzantine" variety. For example, if you have an entire data center power outage, you may lose a few seconds of data that had been synced in the WAL but not fsynced (i.e. flushed by the OS to magnetic media). There are also various known bugs involving multiple failure scenarios where it could lose data (for example, if you have multiple successive node failures during replication). To my knowledge, there are no known "simple" cases where HBase will lose data. > > For that matter, relational DBs can lose data too (I've seen it happen, recently, because of a HW failure). So ultimately, it comes down to how valuable the data is to you, and how many redundant measures you're willing to take to prevent increasingly rare situations. You accounting for earthquakes? Solar flares? :) > > Ian > > > On May 13, 2012, at 11:21 PM, Srikanth P. Shreenivas wrote: > > > There is a possibility that you may lose data, and hence, I would not use it for first class data if data cannot be re-created. > > If you can derive data from secondary source and store data in HBase for performance gains, then, it is a viable use case. > > > > Regards, > > Srikanth > > > > -----Original Message----- > > From: S Ahmed [mailto:[EMAIL PROTECTED]] > > Sent: Monday, May 14, 2012 7:52 AM > > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]); Otis Gospodnetic > > Subject: Re: hbase as a primary store, or is it more for "2nd class" data? > > > > Otis, > > > > It kind of goes back to what I was saying earlier, if FB is using it for searching your inbox, or storing your chat messages or wall posts, I don't really think that is important (and really it isn't hehe) > > > > I was just making an observation and wanted to get a feel for what others think. Obviously ever tool has its purpose and domain, and I was curious as to what others have seen in production usage etc. > > > > (I do realize some use cases the data is very important like analytic data that usually correlates to advertising $$ etc.) > > > > On Sun, May 13, 2012 at 10:00 PM, Otis Gospodnetic < [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> wrote: > > > > > Hi Ahmed, > > > > > > At Sematext we have a few SaaS products that use HBase as the primary > > > data store. I hear Facebook uses HBase for some important stuff, too. > > > ;) So far we've survived. HBase does have rough edges, but also good > > > developers who are making it better every day. > > > > > > Otis > > > ---- > > > Performance Monitoring for Solr / ElasticSearch / HBase - > > > http://sematext.com/spm> > > > > > > > > > > > > ________________________________ > > > > From: S Ahmed <[EMAIL PROTECTED] (mailto:[EMAIL PROTECTED])> > > > > To: [EMAIL PROTECTED] (mailto:[EMAIL PROTECTED]) > > > > Sent: Sunday, May 13, 2012 8:14 PM
|
|