|
anil gupta
2012-05-24, 21:51
Anoop Sam John
2012-05-25, 00:57
Ted Yu
2012-05-25, 04:37
Matt Corgan
2012-05-25, 06:22
anil gupta
2012-05-29, 23:29
Matt Corgan
2012-05-29, 23:46
Anoop Sam John
2012-05-30, 04:26
anil gupta
2012-05-30, 19:57
|
-
Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Tableanil gupta 2012-05-24, 21:51
Hi All,
We are planning to store data in HBase. Currently, in one of our use case once a row is written into HBase Table we wont be modifying the data of that row. Since, for every cell(right?) in HBase a timestamp(long value) is stored; this would take up extra 8 bytes. I was thinking is there a way to disable timestamp on HBase table when versioning is not required. I went through the documentation and searched mailing list for same but could not find anything relevant. Since we are talking about billions of cells, this would add up to significant amount of space.(around 7.45 GigaBytes for 1 billion cells). Does this sounds like a feature HBase is missing? Please share your thoughts. -- Thanks & Regards, Anil Gupta +
anil gupta 2012-05-24, 21:51
-
RE: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase TableAnoop Sam John 2012-05-25, 00:57
Hi Anil,
There is no way you can avoid the timestamp with KVs. In your case you can think of using data block encoding? You can see FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of avoiding writing the 8 bytes into each KV for timestamp. Still some bytes will be written though and this will be done at the block level. Also pls note that these encoders will do much more things than the timestamp space optimization. Also you need to make sure to pass some timestamp in your Puts. May be better make as 0L. Else in RS side HBase will assign the cur time as the timestamp. Hope when u read the javadoc for these encoder classes, u will be more clear. The one you are telling abt having a feature to fully avoid the timestamp is a topic to discuss Hope I make it clear to you -Anoop- ________________________________________ From: anil gupta [[EMAIL PROTECTED]] Sent: Friday, May 25, 2012 3:21 AM To: [EMAIL PROTECTED] Subject: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table Hi All, We are planning to store data in HBase. Currently, in one of our use case once a row is written into HBase Table we wont be modifying the data of that row. Since, for every cell(right?) in HBase a timestamp(long value) is stored; this would take up extra 8 bytes. I was thinking is there a way to disable timestamp on HBase table when versioning is not required. I went through the documentation and searched mailing list for same but could not find anything relevant. Since we are talking about billions of cells, this would add up to significant amount of space.(around 7.45 GigaBytes for 1 billion cells). Does this sounds like a feature HBase is missing? Please share your thoughts. -- Thanks & Regards, Anil Gupta +
Anoop Sam John 2012-05-25, 00:57
-
Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase TableTed Yu 2012-05-25, 04:37
What Anoop said is in 0.94.0
For trunk, HBASE-4676 provides trie data block encoding. It suits write-once read-many use case very well. Cheers On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi Anil, > There is no way you can avoid the timestamp with KVs. In your > case you can think of using data block encoding? You can see > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of avoiding > writing the 8 bytes into each KV for timestamp. Still some bytes will be > written though and this will be done at the block level. Also pls note that > these encoders will do much more things than the timestamp space > optimization. Also you need to make sure to pass some timestamp in your > Puts. May be better make as 0L. Else in RS side HBase will assign the cur > time as the timestamp. Hope when u read the javadoc for these encoder > classes, u will be more clear. > > The one you are telling abt having a feature to fully avoid the timestamp > is a topic to discuss > > Hope I make it clear to you > > -Anoop- > ________________________________________ > From: anil gupta [[EMAIL PROTECTED]] > Sent: Friday, May 25, 2012 3:21 AM > To: [EMAIL PROTECTED] > Subject: Disable timestamp in HBase Table a.k.a Disable Versioning in > HBase Table > > Hi All, > > We are planning to store data in HBase. Currently, in one of our use case > once a row is written into HBase Table we wont be modifying the data of > that row. Since, for every cell(right?) in HBase a timestamp(long value) is > stored; this would take up extra 8 bytes. I was thinking is there a way to > disable timestamp on HBase table when versioning is not required. I went > through the documentation and searched mailing list for same but could not > find anything relevant. Since we are talking about billions of cells, this > would add up to significant amount of space.(around 7.45 GigaBytes for 1 > billion cells). Does this sounds like a feature HBase is missing? > > Please share your thoughts. > > -- > Thanks & Regards, > Anil Gupta > +
Ted Yu 2012-05-25, 04:37
-
Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase TableMatt Corgan 2012-05-25, 06:22
Hi Anil,
I created HBASE-6093 <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that could solve this problem. It could be a simple implementation for simple workloads, but gets harder to support for tables with TTL's, maxVersion > 1, Deletes, etc... Maybe it can only be enabled if the other ColumnFamily settings are compatible. Matt On Thu, May 24, 2012 at 9:37 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > What Anoop said is in 0.94.0 > > For trunk, HBASE-4676 provides trie data block encoding. > It suits write-once read-many use case very well. > > Cheers > > On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <[EMAIL PROTECTED]> > wrote: > > > Hi Anil, > > There is no way you can avoid the timestamp with KVs. In your > > case you can think of using data block encoding? You can see > > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of > avoiding > > writing the 8 bytes into each KV for timestamp. Still some bytes will be > > written though and this will be done at the block level. Also pls note > that > > these encoders will do much more things than the timestamp space > > optimization. Also you need to make sure to pass some timestamp in your > > Puts. May be better make as 0L. Else in RS side HBase will assign the cur > > time as the timestamp. Hope when u read the javadoc for these encoder > > classes, u will be more clear. > > > > The one you are telling abt having a feature to fully avoid the timestamp > > is a topic to discuss > > > > Hope I make it clear to you > > > > -Anoop- > > ________________________________________ > > From: anil gupta [[EMAIL PROTECTED]] > > Sent: Friday, May 25, 2012 3:21 AM > > To: [EMAIL PROTECTED] > > Subject: Disable timestamp in HBase Table a.k.a Disable Versioning in > > HBase Table > > > > Hi All, > > > > We are planning to store data in HBase. Currently, in one of our use case > > once a row is written into HBase Table we wont be modifying the data of > > that row. Since, for every cell(right?) in HBase a timestamp(long value) > is > > stored; this would take up extra 8 bytes. I was thinking is there a way > to > > disable timestamp on HBase table when versioning is not required. I went > > through the documentation and searched mailing list for same but could > not > > find anything relevant. Since we are talking about billions of cells, > this > > would add up to significant amount of space.(around 7.45 GigaBytes for 1 > > billion cells). Does this sounds like a feature HBase is missing? > > > > Please share your thoughts. > > > > -- > > Thanks & Regards, > > Anil Gupta > > > +
Matt Corgan 2012-05-25, 06:22
-
Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Tableanil gupta 2012-05-29, 23:29
Hi All,
Sorry for late reply as i got stuck in other task at work on Friday and skimming through the HBase-4676 took me a while. HBase-6093 seems to be very close to my suggestion. The only difference is that Matt mentioned in the description that it can only be used when all inserts are type=Put. Is aforementioned restriction due to HFileV2? I think deleting an entire row wouldn't be a problem. right? I have very little knowledge about HFileV2. I will try to read about HFileV2 soon. HBASE-4676 seems really cool. IMHO, currently the issue is that write and scan(slower by ~2x as compared to NONE if we assume that Trie compresses by ~2-3x) are slow and as per the jira if ratio of value/Key is big then trie wont have any impact. Is this feature going to be part of any future release of HBase? Awesome stuff Matt. @Anoop: You meant that i should use the feature in HBase-4676 and pass the timestamp as 0L in each put. Right? Thanks all for your valuable time and inputs. -Anil On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > Hi Anil, > > I created HBASE-6093 > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that > could solve this problem. It could be a simple > implementation for simple workloads, but gets harder to support for tables > with TTL's, maxVersion > 1, Deletes, etc... Maybe it can only be enabled > if the other ColumnFamily settings are compatible. > > Matt > > > On Thu, May 24, 2012 at 9:37 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > What Anoop said is in 0.94.0 > > > > For trunk, HBASE-4676 provides trie data block encoding. > > It suits write-once read-many use case very well. > > > > Cheers > > > > On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <[EMAIL PROTECTED]> > > wrote: > > > > > Hi Anil, > > > There is no way you can avoid the timestamp with KVs. In your > > > case you can think of using data block encoding? You can see > > > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of > > avoiding > > > writing the 8 bytes into each KV for timestamp. Still some bytes will > be > > > written though and this will be done at the block level. Also pls note > > that > > > these encoders will do much more things than the timestamp space > > > optimization. Also you need to make sure to pass some timestamp in your > > > Puts. May be better make as 0L. Else in RS side HBase will assign the > cur > > > time as the timestamp. Hope when u read the javadoc for these encoder > > > classes, u will be more clear. > > > > > > The one you are telling abt having a feature to fully avoid the > timestamp > > > is a topic to discuss > > > > > > Hope I make it clear to you > > > > > > -Anoop- > > > ________________________________________ > > > From: anil gupta [[EMAIL PROTECTED]] > > > Sent: Friday, May 25, 2012 3:21 AM > > > To: [EMAIL PROTECTED] > > > Subject: Disable timestamp in HBase Table a.k.a Disable Versioning in > > > HBase Table > > > > > > Hi All, > > > > > > We are planning to store data in HBase. Currently, in one of our use > case > > > once a row is written into HBase Table we wont be modifying the data of > > > that row. Since, for every cell(right?) in HBase a timestamp(long > value) > > is > > > stored; this would take up extra 8 bytes. I was thinking is there a way > > to > > > disable timestamp on HBase table when versioning is not required. I > went > > > through the documentation and searched mailing list for same but could > > not > > > find anything relevant. Since we are talking about billions of cells, > > this > > > would add up to significant amount of space.(around 7.45 GigaBytes for > 1 > > > billion cells). Does this sounds like a feature HBase is missing? > > > > > > Please share your thoughts. > > > > > > -- > > > Thanks & Regards, > > > Anil Gupta > > > > > > -- Thanks & Regards, Anil Gupta +
anil gupta 2012-05-29, 23:29
-
Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase TableMatt Corgan 2012-05-29, 23:46
>
> Is this feature going to be part of any future release of HBase? i couldn't get it finished in time for 0.94, but i think it's very likely to be in 0.96, possibly with a backport to .94. Scan speed should improve if i have time to optimize the cell comparators and collators On Tue, May 29, 2012 at 4:29 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi All, > > Sorry for late reply as i got stuck in other task at work on Friday and > skimming through the HBase-4676 took me a while. > > HBase-6093 seems to be very close to my suggestion. The only difference is > that Matt mentioned in the description that it can only be used when all > inserts are type=Put. Is aforementioned restriction due to HFileV2? I think > deleting an entire row wouldn't be a problem. right? I have very little > knowledge about HFileV2. I will try to read about HFileV2 soon. > > HBASE-4676 seems really cool. IMHO, currently the issue is that write and > scan(slower by ~2x as compared to NONE if we assume that Trie compresses by > ~2-3x) are slow and as per the jira if ratio of value/Key is big then trie > wont have any impact. Is this feature going to be part of any future > release of HBase? Awesome stuff Matt. > > @Anoop: You meant that i should use the feature in HBase-4676 and pass the > timestamp as 0L in each put. Right? > > Thanks all for your valuable time and inputs. > -Anil > > > On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > > Hi Anil, > > > > I created HBASE-6093 > > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that > > could solve this problem. It could be a simple > > implementation for simple workloads, but gets harder to support for > tables > > with TTL's, maxVersion > 1, Deletes, etc... Maybe it can only be enabled > > if the other ColumnFamily settings are compatible. > > > > Matt > > > > > > On Thu, May 24, 2012 at 9:37 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > What Anoop said is in 0.94.0 > > > > > > For trunk, HBASE-4676 provides trie data block encoding. > > > It suits write-once read-many use case very well. > > > > > > Cheers > > > > > > On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi Anil, > > > > There is no way you can avoid the timestamp with KVs. In > your > > > > case you can think of using data block encoding? You can see > > > > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of > > > avoiding > > > > writing the 8 bytes into each KV for timestamp. Still some bytes will > > be > > > > written though and this will be done at the block level. Also pls > note > > > that > > > > these encoders will do much more things than the timestamp space > > > > optimization. Also you need to make sure to pass some timestamp in > your > > > > Puts. May be better make as 0L. Else in RS side HBase will assign the > > cur > > > > time as the timestamp. Hope when u read the javadoc for these > encoder > > > > classes, u will be more clear. > > > > > > > > The one you are telling abt having a feature to fully avoid the > > timestamp > > > > is a topic to discuss > > > > > > > > Hope I make it clear to you > > > > > > > > -Anoop- > > > > ________________________________________ > > > > From: anil gupta [[EMAIL PROTECTED]] > > > > Sent: Friday, May 25, 2012 3:21 AM > > > > To: [EMAIL PROTECTED] > > > > Subject: Disable timestamp in HBase Table a.k.a Disable Versioning in > > > > HBase Table > > > > > > > > Hi All, > > > > > > > > We are planning to store data in HBase. Currently, in one of our use > > case > > > > once a row is written into HBase Table we wont be modifying the data > of > > > > that row. Since, for every cell(right?) in HBase a timestamp(long > > value) > > > is > > > > stored; this would take up extra 8 bytes. I was thinking is there a > way > > > to > > > > disable timestamp on HBase table when versioning is not required. I > > went > > > > through the documentation and searched mailing list for same but +
Matt Corgan 2012-05-29, 23:46
-
RE: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase TableAnoop Sam John 2012-05-30, 04:26
Hi Anil,
As HBASE-4676 is not available as of now, may be you can check other enoders, DiffKeyDeltaEncoder or FastDiffDeltaEncoder. Pls go through the javadoc of these and see what they do apart from compressing the timestamp parts. These do other nice stiff too which will make your data stored on disk to be smaller size. When HBASE-4676 comes you can try using that as it would be more close to your need I think. Also pls make sure to set timestamp as 0L in all your Puts. If you don't do that then HBase will set the curtime in millis as the timestamp for each Put. -Anoop- ________________________________________ From: Matt Corgan [[EMAIL PROTECTED]] Sent: Wednesday, May 30, 2012 5:16 AM To: [EMAIL PROTECTED] Subject: Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Table > > Is this feature going to be part of any future release of HBase? i couldn't get it finished in time for 0.94, but i think it's very likely to be in 0.96, possibly with a backport to .94. Scan speed should improve if i have time to optimize the cell comparators and collators On Tue, May 29, 2012 at 4:29 PM, anil gupta <[EMAIL PROTECTED]> wrote: > Hi All, > > Sorry for late reply as i got stuck in other task at work on Friday and > skimming through the HBase-4676 took me a while. > > HBase-6093 seems to be very close to my suggestion. The only difference is > that Matt mentioned in the description that it can only be used when all > inserts are type=Put. Is aforementioned restriction due to HFileV2? I think > deleting an entire row wouldn't be a problem. right? I have very little > knowledge about HFileV2. I will try to read about HFileV2 soon. > > HBASE-4676 seems really cool. IMHO, currently the issue is that write and > scan(slower by ~2x as compared to NONE if we assume that Trie compresses by > ~2-3x) are slow and as per the jira if ratio of value/Key is big then trie > wont have any impact. Is this feature going to be part of any future > release of HBase? Awesome stuff Matt. > > @Anoop: You meant that i should use the feature in HBase-4676 and pass the > timestamp as 0L in each put. Right? > > Thanks all for your valuable time and inputs. > -Anil > > > On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <[EMAIL PROTECTED]> wrote: > > > Hi Anil, > > > > I created HBASE-6093 > > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that > > could solve this problem. It could be a simple > > implementation for simple workloads, but gets harder to support for > tables > > with TTL's, maxVersion > 1, Deletes, etc... Maybe it can only be enabled > > if the other ColumnFamily settings are compatible. > > > > Matt > > > > > > On Thu, May 24, 2012 at 9:37 PM, Ted Yu <[EMAIL PROTECTED]> wrote: > > > > > What Anoop said is in 0.94.0 > > > > > > For trunk, HBASE-4676 provides trie data block encoding. > > > It suits write-once read-many use case very well. > > > > > > Cheers > > > > > > On Thu, May 24, 2012 at 5:57 PM, Anoop Sam John <[EMAIL PROTECTED]> > > > wrote: > > > > > > > Hi Anil, > > > > There is no way you can avoid the timestamp with KVs. In > your > > > > case you can think of using data block encoding? You can see > > > > FastDiffDeltaEncoder and DiffKeyDeltaEncoder. This includes way of > > > avoiding > > > > writing the 8 bytes into each KV for timestamp. Still some bytes will > > be > > > > written though and this will be done at the block level. Also pls > note > > > that > > > > these encoders will do much more things than the timestamp space > > > > optimization. Also you need to make sure to pass some timestamp in > your > > > > Puts. May be better make as 0L. Else in RS side HBase will assign the > > cur > > > > time as the timestamp. Hope when u read the javadoc for these > encoder > > > > classes, u will be more clear. > > > > > > > > The one you are telling abt having a feature to fully avoid the > > timestamp > > > > is a topic to discuss > > > > > > > > Hope I make it clear to you +
Anoop Sam John 2012-05-30, 04:26
-
Re: Disable timestamp in HBase Table a.k.a Disable Versioning in HBase Tableanil gupta 2012-05-30, 19:57
@Anoop: We recently finished out first phase of POC. It went quite well.
Now, we are trying to see which all features we are going to use for final implementation. We are still in research mode trying out different options. We are also trying out LZO and Snappy compression algos. Yes, in my POC V1 also in my custom mapper for bulkloader i was passing same value of curtime in millis for a single row. I can easily change the loader to take 0L as timestamp for all data. @Matt: We are using cloudera distribution at present. So, i will need to ask cloudera folks about the hbase version used in cdh4(at present it's 0.92). I looked into hbase site and current stable version is 0.92. So, i think it seems really tough that 0.96 will be a stable release in next 3-4 months. Anyways, any idea when HBase 0.96 is supposed to be released?stable? > HBase-6093 seems to be very close to my suggestion. The only difference is > that Matt mentioned in the description that it can only be used when all > inserts are type=Put. Is aforementioned restriction due to HFileV2? I think > deleting an entire row wouldn't be a problem. right? Any inputs on the above question? On Tue, May 29, 2012 at 9:26 PM, Anoop Sam John <[EMAIL PROTECTED]> wrote: > Hi Anil, > As HBASE-4676 is not available as of now, may be you can check > other enoders, DiffKeyDeltaEncoder or FastDiffDeltaEncoder. > Pls go through the javadoc of these and see what they do apart from > compressing the timestamp parts. These do other nice stiff too which will > make your data stored on disk to be smaller size. > > When HBASE-4676 comes you can try using that as it would be more close to > your need I think. > > Also pls make sure to set timestamp as 0L in all your Puts. If you don't > do that then HBase will set the curtime in millis as the timestamp for each > Put. > > -Anoop- > ________________________________________ > From: Matt Corgan [[EMAIL PROTECTED]] > Sent: Wednesday, May 30, 2012 5:16 AM > To: [EMAIL PROTECTED] > Subject: Re: Disable timestamp in HBase Table a.k.a Disable Versioning in > HBase Table > > > > > Is this feature going to be part of any future release of HBase? > > i couldn't get it finished in time for 0.94, but i think it's very likely > to be in 0.96, possibly with a backport to .94. Scan speed should improve > if i have time to optimize the cell comparators and collators > > > On Tue, May 29, 2012 at 4:29 PM, anil gupta <[EMAIL PROTECTED]> wrote: > > > Hi All, > > > > Sorry for late reply as i got stuck in other task at work on Friday and > > skimming through the HBase-4676 took me a while. > > > > HBase-6093 seems to be very close to my suggestion. The only difference > is > > that Matt mentioned in the description that it can only be used when all > > inserts are type=Put. Is aforementioned restriction due to HFileV2? I > think > > deleting an entire row wouldn't be a problem. right? I have very little > > knowledge about HFileV2. I will try to read about HFileV2 soon. > > > > HBASE-4676 seems really cool. IMHO, currently the issue is that write and > > scan(slower by ~2x as compared to NONE if we assume that Trie compresses > by > > ~2-3x) are slow and as per the jira if ratio of value/Key is big then > trie > > wont have any impact. Is this feature going to be part of any future > > release of HBase? Awesome stuff Matt. > > > > @Anoop: You meant that i should use the feature in HBase-4676 and pass > the > > timestamp as 0L in each put. Right? > > > > Thanks all for your valuable time and inputs. > > -Anil > > > > > > On Thu, May 24, 2012 at 11:22 PM, Matt Corgan <[EMAIL PROTECTED]> > wrote: > > > > > Hi Anil, > > > > > > I created HBASE-6093 > > > <https://issues.apache.org/jira/browse/HBASE-6093>with an idea that > > > could solve this problem. It could be a simple > > > implementation for simple workloads, but gets harder to support for > > tables > > > with TTL's, maxVersion > 1, Deletes, etc... Maybe it can only be > enabled > > > if the other ColumnFamily settings are compatible. Thanks & Regards, Anil Gupta +
anil gupta 2012-05-30, 19:57
|