|
Otis Gospodnetic
2011-04-19, 19:33
Ted Dunning
2011-04-19, 20:09
Jean-Daniel Cryans
2011-04-19, 21:10
Edward Capriolo
2011-04-19, 21:11
Otis Gospodnetic
2011-04-19, 21:26
Jean-Daniel Cryans
2011-04-19, 21:28
Otis Gospodnetic
2011-04-19, 21:36
Jean-Daniel Cryans
2011-04-19, 21:42
|
-
Region replication?Otis Gospodnetic 2011-04-19, 19:33
Hi,
I imagine lots of HBase folks have read or will want to read http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , including comments. My question has to do with one of the good comments from Edward Capriolo, who pointed out that some of the Configurations he described in his Cassandra as Memcached talk ( http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are not possible with HBase because in HBase there is only 1 copy of any given Region and it lives on a single RegionServer (I'm assuming this is correct?), thus making it impossible to spread reads of data from one Region over multiple RegionServers: http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 So I poked around on search-hadoop.com and JIRA, and looked at http://hbase.apache.org/book/regions.arch.html to see about this limitation, whether it's even mentioned as a limitation, whether there are plans to change it or if there are some configuration alternatives that would make some of those configurations described by Ed possible with HBase, but I actually didn't find any explicit information about that. Would anyone care to comment? :) Many thanks, Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/
-
Re: Region replication?Ted Dunning 2011-04-19, 20:09
This is kind of true.
There is only one regionserver to handle the reads, but there are multiple copies of the data to handle fail-over. On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > My question has to do with one of the good comments from Edward Capriolo, who > pointed out that some of the Configurations he described in his Cassandra as > Memcached talk ( > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are > not possible with HBase because in HBase there is only 1 copy of any given > Region and it lives on a single RegionServer (I'm assuming this is correct?), > thus making it impossible to spread reads of data from one Region over multiple > RegionServers:
-
Re: Region replication?Jean-Daniel Cryans 2011-04-19, 21:10
We have something on the menu:
https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add read-only region replicas (slaves) for availability and fast region recovery Something to keep in mind is that you have to cache the data for each replica, so a row could be in 3 different caches (which also have to be warmed). I guess this is useful for very hot rows compared to a much larger read distribution, in which case you'd really want to cache it only once else you'd need 3x the memory to hold your dataset in cache. J-D On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi, > > I imagine lots of HBase folks have read or will want to read > http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , > including comments. > > My question has to do with one of the good comments from Edward Capriolo, who > pointed out that some of the Configurations he described in his Cassandra as > Memcached talk ( > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are > not possible with HBase because in HBase there is only 1 copy of any given > Region and it lives on a single RegionServer (I'm assuming this is correct?), > thus making it impossible to spread reads of data from one Region over multiple > RegionServers: > > http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 > > > So I poked around on search-hadoop.com and JIRA, and looked at > http://hbase.apache.org/book/regions.arch.html to see about this limitation, > whether it's even mentioned as a limitation, whether there are plans to change > it or if there are some configuration alternatives that would make some of those > configurations described by Ed possible with HBase, but I actually didn't find > any explicit information about that. > > Would anyone care to comment? :) > > Many thanks, > Otis > -- > We're hiring HBase hackers for Data Mining and Analytics > http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ >
-
Re: Region replication?Edward Capriolo 2011-04-19, 21:11
On Tue, Apr 19, 2011 at 4:09 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> This is kind of true. > > There is only one regionserver to handle the reads, but there are > multiple copies of the data to handle fail-over. > > On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: >> My question has to do with one of the good comments from Edward Capriolo, who >> pointed out that some of the Configurations he described in his Cassandra as >> Memcached talk ( >> http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) are >> not possible with HBase because in HBase there is only 1 copy of any given >> Region and it lives on a single RegionServer (I'm assuming this is correct?), >> thus making it impossible to spread reads of data from one Region over multiple >> RegionServers: > It is not "kinda of true". It "is" true. A summary of slide 22 is: Cassandra 20 nodes Replication Factor 20 Results in: 20 nodes capable of serving this reads! With HBase, regardless of how many HDFS file copies exist, only one RegionServer can actively serve a region.
-
Re: Region replication?Otis Gospodnetic 2011-04-19, 21:26
Thanks J-D!
Yeah, what you describe below is also something that I think Edward pointed out in some of his slides - that you could route all requests for X to the place where X is when you don't want to have X cached (in app-level caches and/or OS-level caches) on multiple servers, but that sometimes you do want to "waste" memory like this because you have to spread requests for X over more servers. Are these two modes going to be supported in HBase? Thanks, Otis ---- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ ----- Original Message ---- > From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tue, April 19, 2011 5:10:07 PM > Subject: Re: Region replication? > > We have something on the menu: > https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add > read-only region replicas (slaves) for availability and fast region > recovery > > Something to keep in mind is that you have to cache the data for each > replica, so a row could be in 3 different caches (which also have to > be warmed). I guess this is useful for very hot rows compared to a > much larger read distribution, in which case you'd really want to > cache it only once else you'd need 3x the memory to hold your dataset > in cache. > > J-D > > On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > Hi, > > > > I imagine lots of HBase folks have read or will want to read > > http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ , > > including comments. > > > > My question has to do with one of the good comments from Edward Capriolo, >who > > pointed out that some of the Configurations he described in his Cassandra > as > > Memcached talk ( > > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) >are > > not possible with HBase because in HBase there is only 1 copy of any given > > Region and it lives on a single RegionServer (I'm assuming this is >correct?), > > thus making it impossible to spread reads of data from one Region over >multiple > > RegionServers: > > > > >http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 > > > > > > > So I poked around on search-hadoop.com and JIRA, and looked at > > http://hbase.apache.org/book/regions.arch.html to see about this limitation, > > whether it's even mentioned as a limitation, whether there are plans to >change > > it or if there are some configuration alternatives that would make some of >those > > configurations described by Ed possible with HBase, but I actually didn't >find > > any explicit information about that. > > > > Would anyone care to comment? :) > > > > Many thanks, > > Otis > > -- > > We're hiring HBase hackers for Data Mining and Analytics > > >http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ > > > >
-
Re: Region replication?Jean-Daniel Cryans 2011-04-19, 21:28
I don't know why you would want to serve from other region servers if
all they did was transferring data, the current situation would be better. J-D On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Thanks J-D! > > Yeah, what you describe below is also something that I think Edward pointed out > in some of his slides - that you could route all requests for X to the place > where X is when you don't want to have X cached (in app-level caches and/or > OS-level caches) on multiple servers, but that sometimes you do want to "waste" > memory like this because you have to spread requests for X over more servers. > > Are these two modes going to be supported in HBase? > > Thanks, > Otis > ---- > We're hiring HBase hackers for Data Mining and Analytics > http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ > > > > > > ----- Original Message ---- >> From: Jean-Daniel Cryans <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Tue, April 19, 2011 5:10:07 PM >> Subject: Re: Region replication? >> >> We have something on the menu: >> https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add >> read-only region replicas (slaves) for availability and fast region >> recovery >> >> Something to keep in mind is that you have to cache the data for each >> replica, so a row could be in 3 different caches (which also have to >> be warmed). I guess this is useful for very hot rows compared to a >> much larger read distribution, in which case you'd really want to >> cache it only once else you'd need 3x the memory to hold your dataset >> in cache. >> >> J-D >> >> On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic >> <[EMAIL PROTECTED]> wrote: >> > Hi, >> > >> > I imagine lots of HBase folks have read or will want to read >> > http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ > , >> > including comments. >> > >> > My question has to do with one of the good comments from Edward Capriolo, >>who >> > pointed out that some of the Configurations he described in his Cassandra >> as >> > Memcached talk ( >> > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) >>are >> > not possible with HBase because in HBase there is only 1 copy of any given >> > Region and it lives on a single RegionServer (I'm assuming this is >>correct?), >> > thus making it impossible to spread reads of data from one Region over >>multiple >> > RegionServers: >> > >> > >>http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 >> >> > >> > >> > So I poked around on search-hadoop.com and JIRA, and looked at >> > http://hbase.apache.org/book/regions.arch.html to see about this > limitation, >> > whether it's even mentioned as a limitation, whether there are plans to >>change >> > it or if there are some configuration alternatives that would make some of >>those >> > configurations described by Ed possible with HBase, but I actually didn't >>find >> > any explicit information about that. >> > >> > Would anyone care to comment? :) >> > >> > Many thanks, >> > Otis >> > -- >> > We're hiring HBase hackers for Data Mining and Analytics >> > >>http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ >> >> > >> >
-
Re: Region replication?Otis Gospodnetic 2011-04-19, 21:36
To make Configuration 4 possible (last slide in
http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) -- Big Request Load, not so Big Data. Otis -- We're hiring HBase hackers for Data Mining and Analytics http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ ----- Original Message ---- > From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Tue, April 19, 2011 5:28:46 PM > Subject: Re: Region replication? > > I don't know why you would want to serve from other region servers if > all they did was transferring data, the current situation would be > better. > > J-D > > On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic > <[EMAIL PROTECTED]> wrote: > > Thanks J-D! > > > > Yeah, what you describe below is also something that I think Edward pointed >out > > in some of his slides - that you could route all requests for X to the place > > where X is when you don't want to have X cached (in app-level caches and/or > > OS-level caches) on multiple servers, but that sometimes you do want to >"waste" > > memory like this because you have to spread requests for X over more >servers. > > > > Are these two modes going to be supported in HBase? > > > > Thanks, > > Otis > > ---- > > We're hiring HBase hackers for Data Mining and Analytics > > >http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ > > > > > > > > > > > > > ----- Original Message ---- > >> From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > >> To: [EMAIL PROTECTED] > >> Sent: Tue, April 19, 2011 5:10:07 PM > >> Subject: Re: Region replication? > >> > >> We have something on the menu: > >> https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add > >> read-only region replicas (slaves) for availability and fast region > >> recovery > >> > >> Something to keep in mind is that you have to cache the data for each > >> replica, so a row could be in 3 different caches (which also have to > >> be warmed). I guess this is useful for very hot rows compared to a > >> much larger read distribution, in which case you'd really want to > >> cache it only once else you'd need 3x the memory to hold your dataset > >> in cache. > >> > >> J-D > >> > >> On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic > >> <[EMAIL PROTECTED]> wrote: > >> > Hi, > >> > > >> > I imagine lots of HBase folks have read or will want to read > >> > >http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ > > , > >> > including comments. > >> > > >> > My question has to do with one of the good comments from Edward >Capriolo, > >>who > >> > pointed out that some of the Configurations he described in his >Cassandra > >> as > >> > Memcached talk ( > >> > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp >) > >>are > >> > not possible with HBase because in HBase there is only 1 copy of any > given > >> > Region and it lives on a single RegionServer (I'm assuming this is > >>correct?), > >> > thus making it impossible to spread reads of data from one Region over > >>multiple > >> > RegionServers: > >> > > >> > >>>http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 >4 > >> > >> > > >> > > >> > So I poked around on search-hadoop.com and JIRA, and looked at > >> > http://hbase.apache.org/book/regions.arch.html to see about this > > limitation, > >> > whether it's even mentioned as a limitation, whether there are plans to > >>change > >> > it or if there are some configuration alternatives that would make some >of > >>those > >> > configurations described by Ed possible with HBase, but I actually >didn't > >>find > >> > any explicit information about that. > >> > > >> > Would anyone care to comment? :) > >> > > >> > Many thanks, > >> > Otis > >> > -- > >> > We're hiring HBase hackers for Data Mining and Analytics
-
Re: Region replication?Jean-Daniel Cryans 2011-04-19, 21:42
That configuration is more like what 2357 would be used for.
You wrote: "that you could route all requests for X to the place where X is when you don't want to have X cached" And it's for that case that I say you should not go through the nodes and talk directly to the RS. J-D On Tue, Apr 19, 2011 at 2:36 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > To make Configuration 4 possible (last slide in > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp ) -- > Big Request Load, not so Big Data. > > Otis > -- > We're hiring HBase hackers for Data Mining and Analytics > http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ > > > > > ----- Original Message ---- >> From: Jean-Daniel Cryans <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Sent: Tue, April 19, 2011 5:28:46 PM >> Subject: Re: Region replication? >> >> I don't know why you would want to serve from other region servers if >> all they did was transferring data, the current situation would be >> better. >> >> J-D >> >> On Tue, Apr 19, 2011 at 2:26 PM, Otis Gospodnetic >> <[EMAIL PROTECTED]> wrote: >> > Thanks J-D! >> > >> > Yeah, what you describe below is also something that I think Edward pointed >>out >> > in some of his slides - that you could route all requests for X to the > place >> > where X is when you don't want to have X cached (in app-level caches and/or >> > OS-level caches) on multiple servers, but that sometimes you do want to >>"waste" >> > memory like this because you have to spread requests for X over more >>servers. >> > >> > Are these two modes going to be supported in HBase? >> > >> > Thanks, >> > Otis >> > ---- >> > We're hiring HBase hackers for Data Mining and Analytics >> > >>http://blog.sematext.com/2011/04/18/hiring-data-mining-analytics-machine-learning-hackers/ >> >> > >> > >> > >> > >> > >> > ----- Original Message ---- >> >> From: Jean-Daniel Cryans <[EMAIL PROTECTED]> >> >> To: [EMAIL PROTECTED] >> >> Sent: Tue, April 19, 2011 5:10:07 PM >> >> Subject: Re: Region replication? >> >> >> >> We have something on the menu: >> >> https://issues.apache.org/jira/browse/HBASE-2357 Coprocessors: Add >> >> read-only region replicas (slaves) for availability and fast region >> >> recovery >> >> >> >> Something to keep in mind is that you have to cache the data for each >> >> replica, so a row could be in 3 different caches (which also have to >> >> be warmed). I guess this is useful for very hot rows compared to a >> >> much larger read distribution, in which case you'd really want to >> >> cache it only once else you'd need 3x the memory to hold your dataset >> >> in cache. >> >> >> >> J-D >> >> >> >> On Tue, Apr 19, 2011 at 12:33 PM, Otis Gospodnetic >> >> <[EMAIL PROTECTED]> wrote: >> >> > Hi, >> >> > >> >> > I imagine lots of HBase folks have read or will want to read >> >> > >>http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/ >> > , >> >> > including comments. >> >> > >> >> > My question has to do with one of the good comments from Edward >>Capriolo, >> >>who >> >> > pointed out that some of the Configurations he described in his >>Cassandra >> >> as >> >> > Memcached talk ( >> >> > http://www.edwardcapriolo.com/roller/edwardcapriolo/resource/memcache.odp >>) >> >>are >> >> > not possible with HBase because in HBase there is only 1 copy of any >> given >> >> > Region and it lives on a single RegionServer (I'm assuming this is >> >>correct?), >> >> > thus making it impossible to spread reads of data from one Region over >> >>multiple >> >> > RegionServers: >> >> > >> >> > >>>>http://blog.milford.io/2011/04/why-i-am-very-excited-about-datastaxs-brisk/#comment-187253604 >>4 >> >> >> >> > >> >> > >> >> > So I poked around on search-hadoop.com and JIRA, and looked at >> >> > http://hbase.apache.org/book/regions.arch.html to see about this >> > limitation, >> >> > whether it's even mentioned as a limitation, whether there are plans to |