|
Karthikeyan Muthukumarasa...
2012-10-03, 13:27
Jacques
2012-10-03, 18:21
Karthikeyan Muthukumarasa...
2012-10-03, 18:31
Jacques
2012-10-03, 18:59
Eugeny Morozov
2012-10-03, 20:50
Karthikeyan Muthukumarasa...
2012-10-05, 04:52
Karthikeyan Muthukumarasa...
2012-10-05, 04:53
|
-
Questions on Table design for time series dataKarthikeyan Muthukumarasa... 2012-10-03, 13:27
Hi,
Our usecase is as follows: We have time series data continuously flowing into the system and has to be stored in HBase. Subscriber Mobile Number (a.k.a MSISDN) is the primary identifier based on which data is stored and later retrieved. There are two sets of parameters that get stored in every record in HBase, lets call them group1 and group2. The number of records that would have group1 parameters would be approx. 6 per day and the same for group2 parameters is approx. 1 per 3 days (their cardinality is different). Typically, the retention policy for group1 parameters is 3 months and for group2 parameters is 1 year. The read-pattern is as follows: An online query would ask for records matching an MSISDN for a given date range, and the system needs to respond with all available data (both from group1 and group2) satifying the MSISDN and data range filters. Question1: Alternative1: Create a single table with G1 and G2 as two column families. Alternative2: Create two tables one for each group Which is the better alternative and what are the pros and cons? Question2: To achieve max. distribution during write and reasonable complexity during read, we decided on the following row key design: <last 3 digits of MSISDN>,<MMDD>,<full MSISDN> We will manually pre-split regions for the table based on the <last 3 digits of MSISDN>,<MMDD> part of row key So there are 1000 (from 3 digits of MSISDN) * 365 (from MMDD) buckets that would translate to as many regions In this case, when retention is configured as < 1 year, the design looks optimal When retention is configured > 1 year, one region might store data for more than 1 day (feb 1 of 2012 and also feb 1 of 2013), which means more data is to be handled by hbase during compactions and read. An alternative Key design, which does not have the above disadvantage is: <last 3 digits of MSISDN>,<YYYYMMDD>,<full MSISDN> this way, in one region, there will be only 1 days data at any point, regardless of retention What are other pros & cons of the two key designs? Question3: In our usecase, delete happens only based on retention policy, where one days full data has to be deleted when rention period is crossed (for eg, if retention is 30 days, on Apr 1 all the data for Mar 1 is deleted) What is the most optimal way to implement this retention policy? Alternative 1: TTL for column famil is configured and we leave it to HBase to delete data during major compaction, but we are not sure of the cost of this major compaction happening in all regions at same time Alternative 2: Through key design logic mentioned before, if we ensure data for one day goes into one set of regions, can we use HBase APIs like HFileArchiver to programatically archive and drop regions? Thanks & Regards MK
-
Re: Questions on Table design for time series dataJacques 2012-10-03, 18:21
I would suggest you watch this video:
http://www.cloudera.com/resource/video-hbasecon-2012-real-performance-gains-with-real-time-data/ The jive guys solved a lot of the problems you're talking about and discuss it in that case study. On Wed, Oct 3, 2012 at 6:27 AM, Karthikeyan Muthukumarasamy < [EMAIL PROTECTED]> wrote: > Hi, > Our usecase is as follows: > We have time series data continuously flowing into the system and has to be > stored in HBase. > Subscriber Mobile Number (a.k.a MSISDN) is the primary identifier based on > which data is stored and later retrieved. > There are two sets of parameters that get stored in every record in HBase, > lets call them group1 and group2. The number of records that would have > group1 parameters would be approx. 6 per day and the same for group2 > parameters is approx. 1 per 3 days (their cardinality is different). > > Typically, the retention policy for group1 parameters is 3 months and for > group2 parameters is 1 year. The read-pattern is as follows: An online > query would ask for records matching an MSISDN for a given date range, and > the system needs to respond with all available data (both from group1 and > group2) satifying the MSISDN and data range filters. > > Question1: > Alternative1: Create a single table with G1 and G2 as two column families. > Alternative2: Create two tables one for each group > Which is the better alternative and what are the pros and cons? > > > Question2: > To achieve max. distribution during write and reasonable complexity during > read, we decided on the following row key design: > <last 3 digits of MSISDN>,<MMDD>,<full MSISDN> > We will manually pre-split regions for the table based on the <last 3 > digits of MSISDN>,<MMDD> part of row key > So there are 1000 (from 3 digits of MSISDN) * 365 (from MMDD) buckets that > would translate to as many regions > In this case, when retention is configured as < 1 year, the design looks > optimal > When retention is configured > 1 year, one region might store data for more > than 1 day (feb 1 of 2012 and also feb 1 of 2013), which means more data is > to be handled by hbase during compactions and read. > An alternative Key design, which does not have the above disadvantage is: > <last 3 digits of MSISDN>,<YYYYMMDD>,<full MSISDN> > this way, in one region, there will be only 1 days data at any point, > regardless of retention > What are other pros & cons of the two key designs? > > Question3: > In our usecase, delete happens only based on retention policy, where one > days full data has to be deleted when rention period is crossed (for eg, if > retention is 30 days, on Apr 1 all the data for Mar 1 is deleted) > What is the most optimal way to implement this retention policy? > Alternative 1: TTL for column famil is configured and we leave it to HBase > to delete data during major compaction, but we are not sure of the cost of > this major compaction happening in all regions at same time > Alternative 2: Through key design logic mentioned before, if we ensure data > for one day goes into one set of regions, can we use HBase APIs like > HFileArchiver to programatically archive and drop regions? > > Thanks & Regards > MK >
-
Re: Questions on Table design for time series dataKarthikeyan Muthukumarasa... 2012-10-03, 18:31
Hi Jacques,
Thanks for the response! Yes, I have seen the video before. It suggets usage of TTL based retention implementation. In their usecase, Jive has a fixed retention say 3 months and so they can pre-create regions for so many buckets, their bucket id is DAY_OF_YEAR%retention_in_days. But, in our usecase, the retention period is configurable, so pre-creationg regions based on retention will not work. Thats why we went for MMDD based buckets which is immune to retention period changes. Now that you know that Ive gone through that video from Jive, I would request you to re-read my specific questions and share your suggestions. Thanks & Regards MK On Wed, Oct 3, 2012 at 11:51 PM, Jacques <[EMAIL PROTECTED]> wrote: > I would suggest you watch this video: > > http://www.cloudera.com/resource/video-hbasecon-2012-real-performance-gains-with-real-time-data/ > > The jive guys solved a lot of the problems you're talking about and discuss > it in that case study. > > > > On Wed, Oct 3, 2012 at 6:27 AM, Karthikeyan Muthukumarasamy < > [EMAIL PROTECTED]> wrote: > > > Hi, > > Our usecase is as follows: > > We have time series data continuously flowing into the system and has to > be > > stored in HBase. > > Subscriber Mobile Number (a.k.a MSISDN) is the primary identifier based > on > > which data is stored and later retrieved. > > There are two sets of parameters that get stored in every record in > HBase, > > lets call them group1 and group2. The number of records that would have > > group1 parameters would be approx. 6 per day and the same for group2 > > parameters is approx. 1 per 3 days (their cardinality is different). > > > > Typically, the retention policy for group1 parameters is 3 months and for > > group2 parameters is 1 year. The read-pattern is as follows: An online > > query would ask for records matching an MSISDN for a given date range, > and > > the system needs to respond with all available data (both from group1 and > > group2) satifying the MSISDN and data range filters. > > > > Question1: > > Alternative1: Create a single table with G1 and G2 as two column > families. > > Alternative2: Create two tables one for each group > > Which is the better alternative and what are the pros and cons? > > > > > > Question2: > > To achieve max. distribution during write and reasonable complexity > during > > read, we decided on the following row key design: > > <last 3 digits of MSISDN>,<MMDD>,<full MSISDN> > > We will manually pre-split regions for the table based on the <last 3 > > digits of MSISDN>,<MMDD> part of row key > > So there are 1000 (from 3 digits of MSISDN) * 365 (from MMDD) buckets > that > > would translate to as many regions > > In this case, when retention is configured as < 1 year, the design looks > > optimal > > When retention is configured > 1 year, one region might store data for > more > > than 1 day (feb 1 of 2012 and also feb 1 of 2013), which means more data > is > > to be handled by hbase during compactions and read. > > An alternative Key design, which does not have the above disadvantage is: > > <last 3 digits of MSISDN>,<YYYYMMDD>,<full MSISDN> > > this way, in one region, there will be only 1 days data at any point, > > regardless of retention > > What are other pros & cons of the two key designs? > > > > Question3: > > In our usecase, delete happens only based on retention policy, where one > > days full data has to be deleted when rention period is crossed (for eg, > if > > retention is 30 days, on Apr 1 all the data for Mar 1 is deleted) > > What is the most optimal way to implement this retention policy? > > Alternative 1: TTL for column famil is configured and we leave it to > HBase > > to delete data during major compaction, but we are not sure of the cost > of > > this major compaction happening in all regions at same time > > Alternative 2: Through key design logic mentioned before, if we ensure > data > > for one day goes into one set of regions, can we use HBase APIs like > > HFileArchiver to programatically archive and drop regions?
-
Re: Questions on Table design for time series dataJacques 2012-10-03, 18:59
We're all volunteers here so we don't always have the time to fully
understand and plan others' schemas. In general your questions seemed to be worried about a lot of things that may or may not matter depending on the specifics of your implementation. Without knowing those specifics it is hard to be super definitive. You seem to be very worried about the cost of compactions and retention. Is that because you're having issues now? Short answers: q1: Unless you have a good reason for splitting up into two tables, I'd keep as one. Pros: Easier to understand/better matches intellectual understanding/allows checkAndPuts across both families/data is colocated (server, not disk) on retrieval if you want to work with both groups simultaneously using get, MR, etc. Con: There will be some extra merge/flush activity if the two columns grow at substantially different rates. q2: 365*1000 regions is problematic (if that is what you're suggesting). Even with HFilev2 and partially loaded multi-level indexes, there is still quite a bit of overhead per region. I pointed you at the Jive thing in part since hashing that value as a bucket seems a lot more reasonable. Additional Random idea: if you know retention policy on insert and your data is immutable post insertion, consider shifting the insert timestamp and maintain a single ttl. Would require more client side code but would allow configurable ttls while utilizing existing HBase infrastructure. q3: Sounds like you're prematurely optimizing here. Maybe others would disagree. I'd use ttl until you find that isn't performant enough. The tension between flexibility and speed is clear here. I'd say you either need to pick specific ttls and optimize for that scenario via region pruning (e.g. separate tables for each ttl type) or you need to use a more general approach that leverages the per value ttl and compaction methodology. There is enough operational work managing an HBase/HDFS cluster without having to worry about specialized region management. Jacques On Wed, Oct 3, 2012 at 11:31 AM, Karthikeyan Muthukumarasamy < [EMAIL PROTECTED]> wrote: > Hi Jacques, > Thanks for the response! > Yes, I have seen the video before. It suggets usage of TTL based retention > implementation. In their usecase, Jive has a fixed retention say 3 months > and so they can pre-create regions for so many buckets, their bucket id is > DAY_OF_YEAR%retention_in_days. But, in our usecase, the retention period is > configurable, so pre-creationg regions based on retention will not work. > Thats why we went for MMDD based buckets which is immune to retention > period changes. > Now that you know that Ive gone through that video from Jive, I would > request you to re-read my specific questions and share your suggestions. > Thanks & Regards > MK > > > > On Wed, Oct 3, 2012 at 11:51 PM, Jacques <[EMAIL PROTECTED]> wrote: > > > I would suggest you watch this video: > > > > > http://www.cloudera.com/resource/video-hbasecon-2012-real-performance-gains-with-real-time-data/ > > > > The jive guys solved a lot of the problems you're talking about and > discuss > > it in that case study. > > > > > > > > On Wed, Oct 3, 2012 at 6:27 AM, Karthikeyan Muthukumarasamy < > > [EMAIL PROTECTED]> wrote: > > > > > Hi, > > > Our usecase is as follows: > > > We have time series data continuously flowing into the system and has > to > > be > > > stored in HBase. > > > Subscriber Mobile Number (a.k.a MSISDN) is the primary identifier based > > on > > > which data is stored and later retrieved. > > > There are two sets of parameters that get stored in every record in > > HBase, > > > lets call them group1 and group2. The number of records that would have > > > group1 parameters would be approx. 6 per day and the same for group2 > > > parameters is approx. 1 per 3 days (their cardinality is different). > > > > > > Typically, the retention policy for group1 parameters is 3 months and > for > > > group2 parameters is 1 year. The read-pattern is as follows: An online
-
Re: Questions on Table design for time series dataEugeny Morozov 2012-10-03, 20:50
I'd suggest to think about manual major compactions and splits. Using
manual compactions and bulkload allows to split HFiles manually. Like if you would like to read last 3 months more often that all others data, then you could have three HFiles for each month and one HFile for whole other stuff. Using scan.setTimestamps would allow to filter out only those three HFiles, thus scan would be faster. Moreover if you have TTL about one month there is no need to run it everyday (as in auto mode). Especially, when using bulkloads you basically control the size of output HFiles by having input of particular size. Say, you give input for last two weeks and have one HFile per regions for last two weeks. Using this new feature known as Coprocessor, you could hook up to the compactSelection process and alter the compaction chosing HFiles you would like to process. That allow to combine particular HFiles. All of that allow to run major compaction just once-twice in month - major compaction over huge amount of data is a heavy operation - the rare, the better. Though without monitoring and measurement it looks like early optimization. On Wed, Oct 3, 2012 at 10:59 PM, Jacques <[EMAIL PROTECTED]> wrote: > We're all volunteers here so we don't always have the time to fully > understand and plan others' schemas. > > In general your questions seemed to be worried about a lot of things that > may or may not matter depending on the specifics of your implementation. > Without knowing those specifics it is hard to be super definitive. You > seem to be very worried about the cost of compactions and retention. Is > that because you're having issues now? > > Short answers: > > q1: Unless you have a good reason for splitting up into two tables, I'd > keep as one. Pros: Easier to understand/better matches intellectual > understanding/allows checkAndPuts across both families/data is colocated > (server, not disk) on retrieval if you want to work with both groups > simultaneously using get, MR, etc. Con: There will be some extra > merge/flush activity if the two columns grow at substantially different > rates. > > q2: 365*1000 regions is problematic (if that is what you're suggesting). > Even with HFilev2 and partially loaded multi-level indexes, there is still > quite a bit of overhead per region. I pointed you at the Jive thing in > part since hashing that value as a bucket seems a lot more reasonable. > Additional Random idea: if you know retention policy on insert and your > data is immutable post insertion, consider shifting the insert timestamp > and maintain a single ttl. Would require more client side code but would > allow configurable ttls while utilizing existing HBase infrastructure. > > q3: Sounds like you're prematurely optimizing here. Maybe others would > disagree. I'd use ttl until you find that isn't performant enough. The > tension between flexibility and speed is clear here. I'd say you either > need to pick specific ttls and optimize for that scenario via region > pruning (e.g. separate tables for each ttl type) or you need to use a more > general approach that leverages the per value ttl and compaction > methodology. There is enough operational work managing an HBase/HDFS > cluster without having to worry about specialized region management. > > Jacques > > On Wed, Oct 3, 2012 at 11:31 AM, Karthikeyan Muthukumarasamy < > [EMAIL PROTECTED]> wrote: > > > Hi Jacques, > > Thanks for the response! > > Yes, I have seen the video before. It suggets usage of TTL based > retention > > implementation. In their usecase, Jive has a fixed retention say 3 months > > and so they can pre-create regions for so many buckets, their bucket id > is > > DAY_OF_YEAR%retention_in_days. But, in our usecase, the retention period > is > > configurable, so pre-creationg regions based on retention will not work. > > Thats why we went for MMDD based buckets which is immune to retention > > period changes. > > Now that you know that Ive gone through that video from Jive, I would Evgeny Morozov Developer Grid Dynamics Skype: morozov.evgeny www.griddynamics.com [EMAIL PROTECTED]
-
Re: Questions on Table design for time series dataKarthikeyan Muthukumarasa... 2012-10-05, 04:52
Jacques: I think you got me wrong on my statement. I was only requesting
you to think again about my questions assuming that I have seen the jive video, since there are some differences in our case compared to jive. I completely understand that all this is voluntary effort and my sincere thanks for your suggestions. I will work through them and get back with updates. Thanks again! On Thu, Oct 4, 2012 at 12:29 AM, Jacques <[EMAIL PROTECTED]> wrote: > We're all volunteers here so we don't always have the time to fully > understand and plan others' schemas. > > In general your questions seemed to be worried about a lot of things that > may or may not matter depending on the specifics of your implementation. > Without knowing those specifics it is hard to be super definitive. You > seem to be very worried about the cost of compactions and retention. Is > that because you're having issues now? > > Short answers: > > q1: Unless you have a good reason for splitting up into two tables, I'd > keep as one. Pros: Easier to understand/better matches intellectual > understanding/allows checkAndPuts across both families/data is colocated > (server, not disk) on retrieval if you want to work with both groups > simultaneously using get, MR, etc. Con: There will be some extra > merge/flush activity if the two columns grow at substantially different > rates. > > q2: 365*1000 regions is problematic (if that is what you're suggesting). > Even with HFilev2 and partially loaded multi-level indexes, there is still > quite a bit of overhead per region. I pointed you at the Jive thing in > part since hashing that value as a bucket seems a lot more reasonable. > Additional Random idea: if you know retention policy on insert and your > data is immutable post insertion, consider shifting the insert timestamp > and maintain a single ttl. Would require more client side code but would > allow configurable ttls while utilizing existing HBase infrastructure. > > q3: Sounds like you're prematurely optimizing here. Maybe others would > disagree. I'd use ttl until you find that isn't performant enough. The > tension between flexibility and speed is clear here. I'd say you either > need to pick specific ttls and optimize for that scenario via region > pruning (e.g. separate tables for each ttl type) or you need to use a more > general approach that leverages the per value ttl and compaction > methodology. There is enough operational work managing an HBase/HDFS > cluster without having to worry about specialized region management. > > Jacques > > On Wed, Oct 3, 2012 at 11:31 AM, Karthikeyan Muthukumarasamy < > [EMAIL PROTECTED]> wrote: > > > Hi Jacques, > > Thanks for the response! > > Yes, I have seen the video before. It suggets usage of TTL based > retention > > implementation. In their usecase, Jive has a fixed retention say 3 months > > and so they can pre-create regions for so many buckets, their bucket id > is > > DAY_OF_YEAR%retention_in_days. But, in our usecase, the retention period > is > > configurable, so pre-creationg regions based on retention will not work. > > Thats why we went for MMDD based buckets which is immune to retention > > period changes. > > Now that you know that Ive gone through that video from Jive, I would > > request you to re-read my specific questions and share your suggestions. > > Thanks & Regards > > MK > > > > > > > > On Wed, Oct 3, 2012 at 11:51 PM, Jacques <[EMAIL PROTECTED]> wrote: > > > > > I would suggest you watch this video: > > > > > > > > > http://www.cloudera.com/resource/video-hbasecon-2012-real-performance-gains-with-real-time-data/ > > > > > > The jive guys solved a lot of the problems you're talking about and > > discuss > > > it in that case study. > > > > > > > > > > > > On Wed, Oct 3, 2012 at 6:27 AM, Karthikeyan Muthukumarasamy < > > > [EMAIL PROTECTED]> wrote: > > > > > > > Hi, > > > > Our usecase is as follows: > > > > We have time series data continuously flowing into the system and has
-
Re: Questions on Table design for time series dataKarthikeyan Muthukumarasa... 2012-10-05, 04:53
Thanks Eugeny. We are currently running some experiments based on your
suggestions! On Thu, Oct 4, 2012 at 2:20 AM, Eugeny Morozov <[EMAIL PROTECTED]>wrote: > I'd suggest to think about manual major compactions and splits. Using > manual compactions and bulkload allows to split HFiles manually. Like if > you would like to read last 3 months more often that all others data, then > you could have three HFiles for each month and one HFile for whole other > stuff. Using scan.setTimestamps would allow to filter out only those three > HFiles, thus scan would be faster. > > Moreover if you have TTL about one month there is no need to run it > everyday (as in auto mode). Especially, when using bulkloads you basically > control the size of output HFiles by having input of particular size. Say, > you give input for last two weeks and have one HFile per regions for last > two weeks. > > Using this new feature known as Coprocessor, you could hook up to the > compactSelection process and alter the compaction chosing HFiles you would > like to process. That allow to combine particular HFiles. > > All of that allow to run major compaction just once-twice in month - major > compaction over huge amount of data is a heavy operation - the rare, the > better. > > Though without monitoring and measurement it looks like early optimization. > > > On Wed, Oct 3, 2012 at 10:59 PM, Jacques <[EMAIL PROTECTED]> wrote: > > > We're all volunteers here so we don't always have the time to fully > > understand and plan others' schemas. > > > > In general your questions seemed to be worried about a lot of things that > > may or may not matter depending on the specifics of your implementation. > > Without knowing those specifics it is hard to be super definitive. You > > seem to be very worried about the cost of compactions and retention. Is > > that because you're having issues now? > > > > Short answers: > > > > q1: Unless you have a good reason for splitting up into two tables, I'd > > keep as one. Pros: Easier to understand/better matches intellectual > > understanding/allows checkAndPuts across both families/data is colocated > > (server, not disk) on retrieval if you want to work with both groups > > simultaneously using get, MR, etc. Con: There will be some extra > > merge/flush activity if the two columns grow at substantially different > > rates. > > > > q2: 365*1000 regions is problematic (if that is what you're suggesting). > > Even with HFilev2 and partially loaded multi-level indexes, there is > still > > quite a bit of overhead per region. I pointed you at the Jive thing in > > part since hashing that value as a bucket seems a lot more reasonable. > > Additional Random idea: if you know retention policy on insert and your > > data is immutable post insertion, consider shifting the insert timestamp > > and maintain a single ttl. Would require more client side code but would > > allow configurable ttls while utilizing existing HBase infrastructure. > > > > q3: Sounds like you're prematurely optimizing here. Maybe others would > > disagree. I'd use ttl until you find that isn't performant enough. The > > tension between flexibility and speed is clear here. I'd say you either > > need to pick specific ttls and optimize for that scenario via region > > pruning (e.g. separate tables for each ttl type) or you need to use a > more > > general approach that leverages the per value ttl and compaction > > methodology. There is enough operational work managing an HBase/HDFS > > cluster without having to worry about specialized region management. > > > > Jacques > > > > On Wed, Oct 3, 2012 at 11:31 AM, Karthikeyan Muthukumarasamy < > > [EMAIL PROTECTED]> wrote: > > > > > Hi Jacques, > > > Thanks for the response! > > > Yes, I have seen the video before. It suggets usage of TTL based > > retention > > > implementation. In their usecase, Jive has a fixed retention say 3 > months > > > and so they can pre-create regions for so many buckets, their bucket id |