|
Sam Seigal
2011-06-03, 07:35
Joey Echeverria
2011-06-03, 12:27
Sam Seigal
2011-06-03, 18:33
tsuna
2011-06-07, 09:07
Kjew Jned
2011-06-08, 02:56
tsuna
2011-06-08, 07:40
Sam Seigal
2011-06-08, 20:54
Otis Gospodnetic
2011-06-09, 01:02
Sam Seigal
2011-06-10, 02:51
Sam Seigal
2011-06-10, 06:12
|
-
hbase hashing algorithm and schema designSam Seigal 2011-06-03, 07:35
Hi,
I am not able to find information regarding the algorithm that decides which region a particular row belongs to in an HBase cluster. Does the algorithm take into account the number of physical nodes ? Where can I find more details about it ? I went through the HBase book and the OpenTSDB schema examples on schema definitions and problems with monotonically increasing row keys, and had a follow up question. I want to be able to query on ranges of time in HBase. Following the OpenTSDB example, I have the following row key format: <eventid> - <yyyy-mm-dd> My eventId can be one of 12 distinct values (let us say from A-L) , and I have a 4 node cluster running HBase right now. However, these event id values are not evenly distributed. I believe that this implies some of the regions in the cluster are going to grow faster in size than others, and eventually will either automatically split or have to be manually split. Should this be a concern at this point ? How is HBase deciding which partition a particular key will go to ? I feel that knowing more details about the algorithm can help me design the schema better. Your help is appreciated. Thank you. Sam
-
Re: hbase hashing algorithm and schema designJoey Echeverria 2011-06-03, 12:27
Rows are split into regions of continuous row keys. Each region is assigned a physical server (region server) to host queries and updates to rows in that region. Currently, the assignment process is random and only balances the number of regions assigned to each server.
The problem with largely sequential key inserts is they will go to the region hosting the end of the key space. That makes this region server a potential bottleneck. If you want to improve write performance, you can prefix each key with a hash of the key. The downside is sequential scans now have to be performed with multiple scanners and re-ordered client side. -Joey On Jun 3, 2011, at 3:35, Sam Seigal <[EMAIL PROTECTED]> wrote: > Hi, > > I am not able to find information regarding the algorithm that decides which > region a particular row belongs to in an HBase cluster. Does the algorithm > take into account the number of physical nodes ? Where can I find more > details about it ? > > I went through the HBase book and the OpenTSDB schema examples on schema > definitions and problems with monotonically increasing row keys, and had a > follow up question. > > I want to be able to query on ranges of time in HBase. Following the > OpenTSDB example, I have the following row key format: > > <eventid> - <yyyy-mm-dd> > > My eventId can be one of 12 distinct values (let us say from A-L) , and I > have a 4 node cluster running HBase right now. However, these event id > values are not evenly distributed. I believe that this implies some of the > regions in the cluster are going to grow faster in size than others, and > eventually will either automatically split or have to be manually split. > Should this be a concern at this point ? How is HBase deciding which > partition a particular key will go to ? I feel that knowing more details > about the algorithm can help me design the schema better. > > Your help is appreciated. > > Thank you. > > Sam
-
Re: hbase hashing algorithm and schema designSam Seigal 2011-06-03, 18:33
Hi Joey,
Thanks for your reply. As I mentioned in the previous email, I prefix the key with an "event id" (<eventid> + <timestamp>). However, this event id is not going to be evenly distributed or random. According to some research I did into the data I receive in my system over the last 6 months, 40% of the data comes from one event id, then 25% from the other and so on. Since the data distribution is skewed, will the regions servers holding the regions for the "hot" event keys become overloaded for writes ? If this happens, is splitting the regions going to solve the problem ? Thank you, Sam On Fri, Jun 3, 2011 at 5:27 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Rows are split into regions of continuous row keys. Each region is assigned > a physical server (region server) to host queries and updates to rows in > that region. Currently, the assignment process is random and only balances > the number of regions assigned to each server. > > The problem with largely sequential key inserts is they will go to the > region hosting the end of the key space. That makes this region server a > potential bottleneck. If you want to improve write performance, you can > prefix each key with a hash of the key. The downside is sequential scans now > have to be performed with multiple scanners and re-ordered client side. > > -Joey > > On Jun 3, 2011, at 3:35, Sam Seigal <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I am not able to find information regarding the algorithm that decides > which > > region a particular row belongs to in an HBase cluster. Does the > algorithm > > take into account the number of physical nodes ? Where can I find more > > details about it ? > > > > I went through the HBase book and the OpenTSDB schema examples on schema > > definitions and problems with monotonically increasing row keys, and had > a > > follow up question. > > > > I want to be able to query on ranges of time in HBase. Following the > > OpenTSDB example, I have the following row key format: > > > > <eventid> - <yyyy-mm-dd> > > > > My eventId can be one of 12 distinct values (let us say from A-L) , and I > > have a 4 node cluster running HBase right now. However, these event id > > values are not evenly distributed. I believe that this implies some of > the > > regions in the cluster are going to grow faster in size than others, and > > eventually will either automatically split or have to be manually split. > > Should this be a concern at this point ? How is HBase deciding which > > partition a particular key will go to ? I feel that knowing more details > > about the algorithm can help me design the schema better. > > > > Your help is appreciated. > > > > Thank you. > > > > Sam >
-
Re: hbase hashing algorithm and schema designtsuna 2011-06-07, 09:07
On Fri, Jun 3, 2011 at 11:33 AM, Sam Seigal <[EMAIL PROTECTED]> wrote:
> Thanks for your reply. As I mentioned in the previous email, I prefix the > key with an "event id" (<eventid> + <timestamp>). However, this event id is > not going to be evenly distributed or random. According to some research I > did into the data I receive in my system over the last 6 months, 40% of the > data comes from one event id, then 25% from the other and so on. Since the > data distribution is skewed, will the regions servers holding the regions > for the "hot" event keys become overloaded for writes ? If this happens, is > splitting the regions going to solve the problem ? Whether or not the skewed distribution will be a problem depends on how much write load you put on your cluster and how much write capacity your cluster gives you. You should test and see whether with your cluster and your workload, you can handle your incoming traffic. If your cluster is unable to handle your write load, splitting regions further won't help, since the "last" 12 regions will always be the ones accepting writes (since you have 12 prefixes). What you can do, however, is to manually move regions around such that the "hot" regions are evenly spread across all your physical servers. You can do this from the HBase shell. In the future, when HBase does proper load balancing of regions, this won't be necessary anymore. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
-
Re: hbase hashing algorithm and schema designKjew Jned 2011-06-08, 02:56
I was studying the OpenTSDB example, where they also prefix the row keys with
event id. I further modified my row keys to have this -> <eventid> <uuid> <yyyy-mm-dd> The uuid is fairly unique and random. Is appending a uuid to the event id help the distribution ? Let us say if I have 4 region servers to start off with and I start the workload, how does HBase decide how many regions is it going to create, and what key is going to go into what region ? I could have gone with something like <uuid><eventid><yyyy-mm-dd> , but would not like to, since my queries are always going to be against a particular event id type, and i would like them to be spatially located. ----- Original Message ---- From: tsuna <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tue, June 7, 2011 2:07:16 AM Subject: Re: hbase hashing algorithm and schema design On Fri, Jun 3, 2011 at 11:33 AM, Sam Seigal <[EMAIL PROTECTED]> wrote: > Thanks for your reply. As I mentioned in the previous email, I prefix the > key with an "event id" (<eventid> + <timestamp>). However, this event id is > not going to be evenly distributed or random. According to some research I > did into the data I receive in my system over the last 6 months, 40% of the > data comes from one event id, then 25% from the other and so on. Since the > data distribution is skewed, will the regions servers holding the regions > for the "hot" event keys become overloaded for writes ? If this happens, is > splitting the regions going to solve the problem ? Whether or not the skewed distribution will be a problem depends on how much write load you put on your cluster and how much write capacity your cluster gives you. You should test and see whether with your cluster and your workload, you can handle your incoming traffic. If your cluster is unable to handle your write load, splitting regions further won't help, since the "last" 12 regions will always be the ones accepting writes (since you have 12 prefixes). What you can do, however, is to manually move regions around such that the "hot" regions are evenly spread across all your physical servers. You can do this from the HBase shell. In the future, when HBase does proper load balancing of regions, this won't be necessary anymore. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
-
Re: hbase hashing algorithm and schema designtsuna 2011-06-08, 07:40
On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned <[EMAIL PROTECTED]> wrote:
> I was studying the OpenTSDB example, where they also prefix the row keys with > event id. > > I further modified my row keys to have this -> > > <eventid> <uuid> <yyyy-mm-dd> > > The uuid is fairly unique and random. > Is appending a uuid to the event id help the distribution ? Yes it will help the distribution, but it will also make certain query patterns harder. You can no longer scan for a time range, for a given eventid. How to solve this problem depends on how you generate the UUIDs. I wouldn't recommend doing this unless you've already tried simpler approaches and reached the conclusion that they don't work. Many people seem to be afraid of creating hot spots in their tables without having first-hand evidence that the hot spots would actually be a problem. > Let us say if I have 4 region servers to start off with and I start the If you have only 4 region servers, your goal should be to have roughly 25% of writes going to each server. It doesn't matter if the 25% slice of one server is going to a single region or not. As long as all the writes don't go to the same row (which would cause lock contention on that row), you'll get the same kind of performance. > workload, how does HBase decide how many regions is it going to create, and what > key is going to go into what region ? Your table starts as a single region. As this region fills up, it'll split. Where it split is chosen by HBase. HBase tries to spit the region "in the middle", so that roughly the number of keys ends up in each new daughter region. You can also manually pre-split a table (from the shell). This can be advantageous in certain situations where you know what your table will look like and you have a very high write volume coupled with aggressive latency requirements for >95th percentile. > I could have gone with something like > > <uuid><eventid><yyyy-mm-dd> , but would not like to, since my queries are always > going to be against a particular event id type, and i would like them to be > spatially located. If you have a lot of data per <eventid>, then putting the <uuid> in between the <eventid> and the <yyyy-mm-dd> will screw up data locality anyway. But the exact details depend on how you pick the <uuid>. -- Benoit "tsuna" Sigoure Software Engineer @ www.StumbleUpon.com
-
Re: hbase hashing algorithm and schema designSam Seigal 2011-06-08, 20:54
On Wed, Jun 8, 2011 at 12:40 AM, tsuna <[EMAIL PROTECTED]> wrote:
> On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned <[EMAIL PROTECTED]> wrote: > > I was studying the OpenTSDB example, where they also prefix the row keys > with > > event id. > > > > I further modified my row keys to have this -> > > > > <eventid> <uuid> <yyyy-mm-dd> > > > > The uuid is fairly unique and random. > > Is appending a uuid to the event id help the distribution ? > > Yes it will help the distribution, but it will also make certain query > patterns harder. You can no longer scan for a time range, for a given > eventid. How to solve this problem depends on how you generate the > UUIDs. > > I wouldn't recommend doing this unless you've already tried simpler > approaches and reached the conclusion that they don't work. Many > people seem to be afraid of creating hot spots in their tables without > having first-hand evidence that the hot spots would actually be a > problem. > > > > Can I not use regex row filters to query for date ranges ? There is an added overhead for the client to order them, and it is not an efficient query, but it is certainly possible to do so ? Am I wrong ? > > Let us say if I have 4 region servers to start off with and I start the > > If you have only 4 region servers, your goal should be to have roughly > 25% of writes going to each server. It doesn't matter if the 25% > slice of one server is going to a single region or not. As long as > all the writes don't go to the same row (which would cause lock > contention on that row), you'll get the same kind of performance. > I am worried about the following scenario, hence putting a lot of thought into how to design this schema. For example, for simplicity sake, I only have two event Ids A and B, and the traffic is equally distributed between them i.e. 50% of my traffic is event A and 50% is event B. I have two region servers running, on two physical nodes with the following schema - <eventid> <timestamp> Ideally, I now have all of A traffic going into regionServerA and all of B traffic going into regionserver B. The cluster is able to hold this traffic, and the write load is distributed 50-50. However, now I reach a point where I need to scale, since the two clusters are not being able to cope with the write traffic. Adding extra regionservers to the cluster is not going to make any difference , since only the physical machine holding the tail end of the region is the one that will receive the traffic. Most of my other cluster is going to be idle. To generalize, if I want to scale where the # of machines is greater than the # of unique event ids, I have no way to distribute the load in an efficient manner, since I cannot distribute the load of a single event id across multiple machines (without adding a uuid somewhere in the middle and sacrificing data locality on ordered timestamps). Do you think my concern is genuine ? Thanks a lot for your help. > > > workload, how does HBase decide how many regions is it going to create, > and what > > key is going to go into what region ? > > Your table starts as a single region. As this region fills up, it'll > split. Where it split is chosen by HBase. HBase tries to spit the > region "in the middle", so that roughly the number of keys ends up in > each new daughter region. > > You can also manually pre-split a table (from the shell). This can be > advantageous in certain situations where you know what your table will > look like and you have a very high write volume coupled with > aggressive latency requirements for >95th percentile. > > > I could have gone with something like > > > > <uuid><eventid><yyyy-mm-dd> , but would not like to, since my queries are > always > > going to be against a particular event id type, and i would like them to > be > > spatially located. > > If you have a lot of data per <eventid>, then putting the <uuid> in > between the <eventid> and the <yyyy-mm-dd> will screw up data locality > anyway. But the exact details depend on how you pick the <uuid>.
-
Re: hbase hashing algorithm and schema designOtis Gospodnetic 2011-06-09, 01:02
Sam, would HBaseWD help you here?
See http://search-hadoop.com/m/AQ7CG2GkiO/hbasewd&subj=+ANN+HBaseWD+Distribute+Sequential+Writes+in+HBase Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Sam Seigal <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Sent: Wed, June 8, 2011 4:54:24 PM > Subject: Re: hbase hashing algorithm and schema design > > On Wed, Jun 8, 2011 at 12:40 AM, tsuna <[EMAIL PROTECTED]> wrote: > > > On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned <[EMAIL PROTECTED]> wrote: > > > I was studying the OpenTSDB example, where they also prefix the row keys > > with > > > event id. > > > > > > I further modified my row keys to have this -> > > > > > > <eventid> <uuid> <yyyy-mm-dd> > > > > > > The uuid is fairly unique and random. > > > Is appending a uuid to the event id help the distribution ? > > > > Yes it will help the distribution, but it will also make certain query > > patterns harder. You can no longer scan for a time range, for a given > > eventid. How to solve this problem depends on how you generate the > > UUIDs. > > > > I wouldn't recommend doing this unless you've already tried simpler > > approaches and reached the conclusion that they don't work. Many > > people seem to be afraid of creating hot spots in their tables without > > having first-hand evidence that the hot spots would actually be a > > problem. > > > > > > > > > Can I not use regex row filters to query for date ranges ? There is an added > overhead for the client to > order them, and it is not an efficient query, but it is certainly possible > to do so ? Am I wrong ? > > > > > > Let us say if I have 4 region servers to start off with and I start the > > > > If you have only 4 region servers, your goal should be to have roughly > > 25% of writes going to each server. It doesn't matter if the 25% > > slice of one server is going to a single region or not. As long as > > all the writes don't go to the same row (which would cause lock > > contention on that row), you'll get the same kind of performance. > > > > I am worried about the following scenario, hence putting a lot of thought > into how to design this schema. > > For example, for simplicity sake, I only have two event Ids A and B, and the > traffic is equally distributed > between them i.e. 50% of my traffic is event A and 50% is event B. I have > two region servers running, on > two physical nodes with the following schema - > > <eventid> <timestamp> > > Ideally, I now have all of A traffic going into regionServerA and all of B > traffic going into regionserver B. > The cluster is able to hold this traffic, and the write load is distributed > 50-50. > > However, now I reach a point where I need to scale, since the two clusters > are not being able to > cope with the write traffic. Adding extra regionservers to the cluster is > not going to make any difference > , since only the physical machine holding the tail end of the region is the > one that will receive > the traffic. Most of my other cluster is going to be idle. > > To generalize, if I want to scale where the # of machines is greater than > the # of unique event ids, I have no way to > distribute the load in an efficient manner, since I cannot distribute the > load of a single event id across multiple machines > (without adding a uuid somewhere in the middle and sacrificing data locality > on ordered timestamps). > > Do you think my concern is genuine ? Thanks a lot for your help. > > > > > > > workload, how does HBase decide how many regions is it going to create, > > and what > > > key is going to go into what region ? > > > > Your table starts as a single region. As this region fills up, it'll > > split. Where it split is chosen by HBase. HBase tries to spit the > > region "in the middle", so that roughly the number of keys ends up in
-
Re: hbase hashing algorithm and schema designSam Seigal 2011-06-10, 02:51
Thanks for the reply Joey. This sounds better than using uuids .. I will
give it a shot. One more thing, with such a setup, will it still be possible to do map reduce jobs ? Is it possible to create a single scanner that will look at all the prefixes ? if not, is it possible to map reduce with multiple scanners ? Thanks a lot for your help. ------------------------------ *From:* Joey Echeverria <[EMAIL PROTECTED]> *To:* Sam Seigal <[EMAIL PROTECTED]> *Sent:* Wed, June 8, 2011 5:08:32 PM *Subject:* Re: hbase hashing algorithm and schema design A better option than a uuid would be to take a hash of the eventid-timestamp, modulo some value (maybe 4x #regionservers) and add that to the front of the key. If you need to scan, create a scanner per prefix and merge the results. -Joey On Jun 8, 2011 4:54 PM, "Sam Seigal" <[EMAIL PROTECTED]> wrote: > On Wed, Jun 8, 2011 at 12:40 AM, tsuna <[EMAIL PROTECTED]> wrote: > >> On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned <[EMAIL PROTECTED]> wrote: >> > I was studying the OpenTSDB example, where they also prefix the row keys >> with >> > event id. >> > >> > I further modified my row keys to have this -> >> > >> > <eventid> <uuid> <yyyy-mm-dd> >> > >> > The uuid is fairly unique and random. >> > Is appending a uuid to the event id help the distribution ? >> >> Yes it will help the distribution, but it will also make certain query >> patterns harder. You can no longer scan for a time range, for a given >> eventid. How to solve this problem depends on how you generate the >> UUIDs. >> >> I wouldn't recommend doing this unless you've already tried simpler >> approaches and reached the conclusion that they don't work. Many >> people seem to be afraid of creating hot spots in their tables without >> having first-hand evidence that the hot spots would actually be a >> problem. >> >> >> >> > Can I not use regex row filters to query for date ranges ? There is an added > overhead for the client to > order them, and it is not an efficient query, but it is certainly possible > to do so ? Am I wrong ? > > > >> > Let us say if I have 4 region servers to start off with and I start the >> >> If you have only 4 region servers, your goal should be to have roughly >> 25% of writes going to each server. It doesn't matter if the 25% >> slice of one server is going to a single region or not. As long as >> all the writes don't go to the same row (which would cause lock >> contention on that row), you'll get the same kind of performance. >> > > I am worried about the following scenario, hence putting a lot of thought > into how to design this schema. > > For example, for simplicity sake, I only have two event Ids A and B, and the > traffic is equally distributed > between them i.e. 50% of my traffic is event A and 50% is event B. I have > two region servers running, on > two physical nodes with the following schema - > > <eventid> <timestamp> > > Ideally, I now have all of A traffic going into regionServerA and all of B > traffic going into regionserver B. > The cluster is able to hold this traffic, and the write load is distributed > 50-50. > > However, now I reach a point where I need to scale, since the two clusters > are not being able to > cope with the write traffic. Adding extra regionservers to the cluster is > not going to make any difference > , since only the physical machine holding the tail end of the region is the > one that will receive > the traffic. Most of my other cluster is going to be idle. > > To generalize, if I want to scale where the # of machines is greater than > the # of unique event ids, I have no way to > distribute the load in an efficient manner, since I cannot distribute the > load of a single event id across multiple machines > (without adding a uuid somewhere in the middle and sacrificing data locality > on ordered timestamps). > > Do you think my concern is genuine ? Thanks a lot for your help. > > >> >> > workload, how does HBase decide how many regions is it going to create, are to
-
Re: hbase hashing algorithm and schema designSam Seigal 2011-06-10, 06:12
Hi Otis,
This approach looks much better than the uuid approach. I will definitely try it out. Thanks for the contribution. Sam ----- Original Message ---- From: Otis Gospodnetic <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wed, June 8, 2011 6:02:16 PM Subject: Re: hbase hashing algorithm and schema design Sam, would HBaseWD help you here? See http://search-hadoop.com/m/AQ7CG2GkiO/hbasewd&subj=+ANN+HBaseWD+Distribute+Sequential+Writes+in+HBase Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase Hadoop ecosystem search :: http://search-hadoop.com/ ----- Original Message ---- > From: Sam Seigal <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Sent: Wed, June 8, 2011 4:54:24 PM > Subject: Re: hbase hashing algorithm and schema design > > On Wed, Jun 8, 2011 at 12:40 AM, tsuna <[EMAIL PROTECTED]> wrote: > > > On Tue, Jun 7, 2011 at 7:56 PM, Kjew Jned <[EMAIL PROTECTED]> wrote: > > > I was studying the OpenTSDB example, where they also prefix the row keys > > with > > > event id. > > > > > > I further modified my row keys to have this -> > > > > > > <eventid> <uuid> <yyyy-mm-dd> > > > > > > The uuid is fairly unique and random. > > > Is appending a uuid to the event id help the distribution ? > > > > Yes it will help the distribution, but it will also make certain query > > patterns harder. You can no longer scan for a time range, for a given > > eventid. How to solve this problem depends on how you generate the > > UUIDs. > > > > I wouldn't recommend doing this unless you've already tried simpler > > approaches and reached the conclusion that they don't work. Many > > people seem to be afraid of creating hot spots in their tables without > > having first-hand evidence that the hot spots would actually be a > > problem. > > > > > > > > > Can I not use regex row filters to query for date ranges ? There is an added > overhead for the client to > order them, and it is not an efficient query, but it is certainly possible > to do so ? Am I wrong ? > > > > > > Let us say if I have 4 region servers to start off with and I start the > > > > If you have only 4 region servers, your goal should be to have roughly > > 25% of writes going to each server. It doesn't matter if the 25% > > slice of one server is going to a single region or not. As long as > > all the writes don't go to the same row (which would cause lock > > contention on that row), you'll get the same kind of performance. > > > > I am worried about the following scenario, hence putting a lot of thought > into how to design this schema. > > For example, for simplicity sake, I only have two event Ids A and B, and the > traffic is equally distributed > between them i.e. 50% of my traffic is event A and 50% is event B. I have > two region servers running, on > two physical nodes with the following schema - > > <eventid> <timestamp> > > Ideally, I now have all of A traffic going into regionServerA and all of B > traffic going into regionserver B. > The cluster is able to hold this traffic, and the write load is distributed > 50-50. > > However, now I reach a point where I need to scale, since the two clusters > are not being able to > cope with the write traffic. Adding extra regionservers to the cluster is > not going to make any difference > , since only the physical machine holding the tail end of the region is the > one that will receive > the traffic. Most of my other cluster is going to be idle. > > To generalize, if I want to scale where the # of machines is greater than > the # of unique event ids, I have no way to > distribute the load in an efficient manner, since I cannot distribute the > load of a single event id across multiple machines > (without adding a uuid somewhere in the middle and sacrificing data locality > on ordered timestamps). > > Do you think my concern is genuine ? Thanks a lot for your help. > > > > > > > workload, how does HBase decide how many regions is it going to create, are to |