|
Claudio Martella
2011-06-18, 16:00
Andrew Purtell
2011-06-18, 19:24
Claudio Martella
2011-06-20, 12:58
Andrew Purtell
2011-06-20, 15:14
Joey Echeverria
2011-06-20, 15:23
Ted Yu
2011-06-20, 15:36
Ted Dunning
2011-06-20, 15:50
Joe Pallas
2011-06-20, 17:27
Joey Echeverria
2011-06-20, 18:03
Jeff Whiting
2011-06-20, 18:28
|
-
on the impact of incremental countersClaudio Martella 2011-06-18, 16:00
Hello list,
I was a few days ago at SIGMOD and was happy to attend Facebook's talk on HBase. As I could understand their workflow makes heavy use of incremental couters for analytics and so is mine. For what I understand the cost of incrementing a counter is 2 * N + 1 IOPS, where N is the number of sequence files over which my dataset is spread, 2 because you have seek AND read and the final 1 comes from the write to the append-log. As that looks like an expensive operation, I was guessing if I was missing something and what are the strategies to alleviate such a cost (a part of bloom filters). Thanks! Claudio -- Claudio Martella Digital Technologies Unit Research & Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 [EMAIL PROTECTED] http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to [EMAIL PROTECTED] in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. +
Claudio Martella 2011-06-18, 16:00
-
Re: on the impact of incremental countersAndrew Purtell 2011-06-18, 19:24
This is from memory, but I expect someone will chime in if any detail is inaccurate. :-)
If the blocks containing the values you are updating fit into blockcache then read IOPS are avoided, satisfied from cache, not disk. Evictions from blockcache are done on an LRU basis. (Packing related frequently accessed items into a block via multiple columns and/or appropriate row key construction is therefore advisable.) When seeking for a value HBase examines storefiles from the most recent backwards and stops as soon as the required number of versions, including delete markers, have been found. Updates only care about the most recent version. The most recent versions of frequently updated values are likely to be found early in the search; the more frequent, the earlier, the first. By enabling bloom filters you can also avoid reads of storefiles known not to contain the value(s) as HBase searches backwards in time. This is because the bloom filters are held in blockcache, and will be frequently accessed for various queries, so are likely to remain resident even if the index or data blocks of the storefiles in question were evicted. Once you update a value then the most recent version will be held in MemStore until flush. If the value is updated with an Increment again before the flush, read-update-replace then happens in memory. Increments do not add to the size of the MemStore so if you are only incrementing existing values then you will not trigger flushing. (Here however we can make improvements to HBase to reduce unnecessary overheads in the current code. There are a couple of jiras open to this effect. For example to my understanding Facebook runs with a local patch that introduces mutable KeyValues for the MemStore, eliminating some unnecessary data structure management overheads.) Unless you explicitly ask HBase to do otherwise -- by .setWriteToWAL(false) -- then there will be a write to and sync of the write ahead log (WAL) for every commit, every Increment. The RegionServer's RPC handler for the current client operation will block until HDFS acknowledges the write at all DataNodes in the write pipeline, typically configured for 3 replicas. Note that here HBase will do per-row group commit, so you can amortize this cost over multiple updates if you can group them. Put values that are often updated together into the same row. Also note that HBase users typically run on top of a HDFS that has been patched with HDFS-895, so HDFS can concurrently service new WAL writes while syncs of others are in progress. <wizard> Furthermore, the WAL is by default configured to sync at every commit but can also be configured to sync after N commits, e.g. 100 or 1000; or after S seconds, e.g. 1 or 10; whichever comes first, according to the loss window (upon RegionServer failure) your use case can tolerate. So in addition to taking advantage of group commit you can amortize sync overhead further with the tradeoff that under failure conditions your counters (or other data) may become imprecise. For some use cases that is fine. </wizard> - Andy --- On Sat, 6/18/11, Claudio Martella <[EMAIL PROTECTED]> wrote: > From: Claudio Martella <[EMAIL PROTECTED]> > Subject: on the impact of incremental counters > To: [EMAIL PROTECTED] > Date: Saturday, June 18, 2011, 9:00 AM > Hello list, > > I was a few days ago at SIGMOD and was happy to attend > Facebook's talk on HBase. > > As I could understand their workflow makes heavy use of incremental > couters for analytics and so is mine. For what I understand the cost of > incrementing a counter is 2 * N + 1 IOPS, where N is the number of > sequence files over which my dataset is spread, 2 because you have seek > AND read and the final 1 comes from the write to the append-log. > As that looks like an expensive operation, I was guessing if I was > missing something and what are the strategies to alleviate such a cost > (a part of bloom filters). > > Thanks! > > Claudio > > -- +
Andrew Purtell 2011-06-18, 19:24
-
Re: on the impact of incremental countersClaudio Martella 2011-06-20, 12:58
This all very much makes sense and matches my current understanding of
things. So, basically it's expensive to increment old data. Thanks for the time and the detailed answer. On 6/18/11 9:24 PM, Andrew Purtell wrote: > This is from memory, but I expect someone will chime in if any detail is inaccurate. :-) > > If the blocks containing the values you are updating fit into blockcache then read IOPS are avoided, satisfied from cache, not disk. Evictions from blockcache are done on an LRU basis. (Packing related frequently accessed items into a block via multiple columns and/or appropriate row key construction is therefore advisable.) > > When seeking for a value HBase examines storefiles from the most recent backwards and stops as soon as the required number of versions, including delete markers, have been found. Updates only care about the most recent version. The most recent versions of frequently updated values are likely to be found early in the search; the more frequent, the earlier, the first. > > By enabling bloom filters you can also avoid reads of storefiles known not to contain the value(s) as HBase searches backwards in time. This is because the bloom filters are held in blockcache, and will be frequently accessed for various queries, so are likely to remain resident even if the index or data blocks of the storefiles in question were evicted. > > Once you update a value then the most recent version will be held in MemStore until flush. If the value is updated with an Increment again before the flush, read-update-replace then happens in memory. Increments do not add to the size of the MemStore so if you are only incrementing existing values then you will not trigger flushing. (Here however we can make improvements to HBase to reduce unnecessary overheads in the current code. There are a couple of jiras open to this effect. For example to my understanding Facebook runs with a local patch that introduces mutable KeyValues for the MemStore, eliminating some unnecessary data structure management overheads.) > > Unless you explicitly ask HBase to do otherwise -- by .setWriteToWAL(false) -- then there will be a write to and sync of the write ahead log (WAL) for every commit, every Increment. The RegionServer's RPC handler for the current client operation will block until HDFS acknowledges the write at all DataNodes in the write pipeline, typically configured for 3 replicas. Note that here HBase will do per-row group commit, so you can amortize this cost over multiple updates if you can group them. Put values that are often updated together into the same row. > > Also note that HBase users typically run on top of a HDFS that has been patched with HDFS-895, so HDFS can concurrently service new WAL writes while syncs of others are in progress. > > <wizard> > Furthermore, the WAL is by default configured to sync at every commit but can also be configured to sync after N commits, e.g. 100 or 1000; or after S seconds, e.g. 1 or 10; whichever comes first, according to the loss window (upon RegionServer failure) your use case can tolerate. So in addition to taking advantage of group commit you can amortize sync overhead further with the tradeoff that under failure conditions your counters (or other data) may become imprecise. For some use cases that is fine. > </wizard> > > - Andy > > > --- On Sat, 6/18/11, Claudio Martella <[EMAIL PROTECTED]> wrote: > >> From: Claudio Martella <[EMAIL PROTECTED]> >> Subject: on the impact of incremental counters >> To: [EMAIL PROTECTED] >> Date: Saturday, June 18, 2011, 9:00 AM >> Hello list, >> >> I was a few days ago at SIGMOD and was happy to attend >> Facebook's talk on HBase. >> >> As I could understand their workflow makes heavy use of incremental >> couters for analytics and so is mine. For what I understand the cost of >> incrementing a counter is 2 * N + 1 IOPS, where N is the number of >> sequence files over which my dataset is spread, 2 because you have seek Claudio Martella Digital Technologies Unit Research & Development - Analyst TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 [EMAIL PROTECTED] http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to [EMAIL PROTECTED] in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. +
Claudio Martella 2011-06-20, 12:58
-
Re: on the impact of incremental countersAndrew Purtell 2011-06-20, 15:14
> From: Claudio Martella <[EMAIL PROTECTED]>
> So, basically it's expensive to increment old data. HBase employs a buffer hierarchy to make updating a working set that can fit in RAM reasonably efficient. (But like I said there are some things remaining we can improve in terms of internal data structure management.) If you are updating a working set that does not fit in RAM or infrequently such that the value is not maintained in cache, then HBase has to go to disk and we move from the order of memory access to the order of disk access. It will obviously be more expensive to increment old data than newer, but I'm not sure I understand what you are getting at. Any data management system with a buffer hierarchy has this behavior. Compared to what? - Andy +
Andrew Purtell 2011-06-20, 15:14
-
Re: on the impact of incremental countersJoey Echeverria 2011-06-20, 15:23
Is there any reason why the increment has to actually happen on
insert? Couldn't an "increment record" be kept, and then the actual increment operation be performed lazily, on reads and compactions? -Joey On Mon, Jun 20, 2011 at 11:14 AM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> From: Claudio Martella <[EMAIL PROTECTED]> >> So, basically it's expensive to increment old data. > > HBase employs a buffer hierarchy to make updating a working set that can fit in RAM reasonably efficient. (But like I said there are some things remaining we can improve in terms of internal data structure management.) > > If you are updating a working set that does not fit in RAM or infrequently such that the value is not maintained in cache, then HBase has to go to disk and we move from the order of memory access to the order of disk access. > > It will obviously be more expensive to increment old data than newer, but I'm not sure I understand what you are getting at. Any data management system with a buffer hierarchy has this behavior. > > Compared to what? > > - Andy > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-06-20, 15:23
-
Re: on the impact of incremental countersTed Yu 2011-06-20, 15:36
I think Dhruba did try the approach Joey mentioned.
On Mon, Jun 20, 2011 at 8:23 AM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Is there any reason why the increment has to actually happen on > insert? Couldn't an "increment record" be kept, and then the actual > increment operation be performed lazily, on reads and compactions? > > -Joey > > On Mon, Jun 20, 2011 at 11:14 AM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > >> From: Claudio Martella <[EMAIL PROTECTED]> > >> So, basically it's expensive to increment old data. > > > > HBase employs a buffer hierarchy to make updating a working set that can > fit in RAM reasonably efficient. (But like I said there are some things > remaining we can improve in terms of internal data structure management.) > > > > If you are updating a working set that does not fit in RAM or > infrequently such that the value is not maintained in cache, then HBase has > to go to disk and we move from the order of memory access to the order of > disk access. > > > > It will obviously be more expensive to increment old data than newer, but > I'm not sure I understand what you are getting at. Any data management > system with a buffer hierarchy has this behavior. > > > > Compared to what? > > > > - Andy > > > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > +
Ted Yu 2011-06-20, 15:36
-
Re: on the impact of incremental countersTed Dunning 2011-06-20, 15:50
Lazy increment on read causes the read to be expensive. That might be a win
if the work load has lots of data that is never read. This could be a good idea on average because my impression is that increment is usually used for metric sorts of data which are often only read in detail in diagnostic post mortem use cases. On Mon, Jun 20, 2011 at 3:23 PM, Joey Echeverria <[EMAIL PROTECTED]> wrote: > Is there any reason why the increment has to actually happen on > insert? Couldn't an "increment record" be kept, and then the actual > increment operation be performed lazily, on reads and compactions? > > -Joey > > On Mon, Jun 20, 2011 at 11:14 AM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > >> From: Claudio Martella <[EMAIL PROTECTED]> > >> So, basically it's expensive to increment old data. > > > > HBase employs a buffer hierarchy to make updating a working set that can > fit in RAM reasonably efficient. (But like I said there are some things > remaining we can improve in terms of internal data structure management.) > > > > If you are updating a working set that does not fit in RAM or > infrequently such that the value is not maintained in cache, then HBase has > to go to disk and we move from the order of memory access to the order of > disk access. > > > > It will obviously be more expensive to increment old data than newer, but > I'm not sure I understand what you are getting at. Any data management > system with a buffer hierarchy has this behavior. > > > > Compared to what? > > > > - Andy > > > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 > +
Ted Dunning 2011-06-20, 15:50
-
Re: on the impact of incremental countersJoe Pallas 2011-06-20, 17:27
On Jun 20, 2011, at 8:50 AM, Ted Dunning wrote: > Lazy increment on read causes the read to be expensive. That might be a win > if the work load has lots of data that is never read. > > This could be a good idea on average because my impression is that increment > is usually used for metric sorts of data which are often only read in detail > in diagnostic post mortem use cases. Just so we're clear, we'd be talking about a new operation, right? Because today's increment returns the incremented value, and some uses (like generating unique values) do require that. joe +
Joe Pallas 2011-06-20, 17:27
-
Re: on the impact of incremental countersJoey Echeverria 2011-06-20, 18:03
Ah, I didn't realize that increment returns the value. Yes, the
current behavior is required in that case. I was thinking of a use case more like the one Ted described, where you're keeping metrics, but don't read the values that frequently. Maybe this should be a new API call. If no one objects, I'll file a JIRA. -Joey On Mon, Jun 20, 2011 at 1:27 PM, Joe Pallas <[EMAIL PROTECTED]> wrote: > > On Jun 20, 2011, at 8:50 AM, Ted Dunning wrote: > >> Lazy increment on read causes the read to be expensive. That might be a win >> if the work load has lots of data that is never read. >> >> This could be a good idea on average because my impression is that increment >> is usually used for metric sorts of data which are often only read in detail >> in diagnostic post mortem use cases. > > Just so we're clear, we'd be talking about a new operation, right? Because today's increment returns the incremented value, and some uses (like generating unique values) do require that. > > joe > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434 +
Joey Echeverria 2011-06-20, 18:03
-
Re: on the impact of incremental countersJeff Whiting 2011-06-20, 18:28
I think it is really split on how people are using it. I agree that for some there is a increment
and forget until I run an infrequent analysis. While others increment and read the value very often. While we do both our most frequent use is that of reading the value very often. If changes are made to the API lets make sure both use cases are considered and not just the "increment and forget." ~Jeff On 6/20/2011 12:03 PM, Joey Echeverria wrote: > Ah, I didn't realize that increment returns the value. Yes, the > current behavior is required in that case. I was thinking of a use > case more like the one Ted described, where you're keeping metrics, > but don't read the values that frequently. Maybe this should be a new > API call. > > If no one objects, I'll file a JIRA. > > -Joey > > On Mon, Jun 20, 2011 at 1:27 PM, Joe Pallas<[EMAIL PROTECTED]> wrote: >> On Jun 20, 2011, at 8:50 AM, Ted Dunning wrote: >> >>> Lazy increment on read causes the read to be expensive. That might be a win >>> if the work load has lots of data that is never read. >>> >>> This could be a good idea on average because my impression is that increment >>> is usually used for metric sorts of data which are often only read in detail >>> in diagnostic post mortem use cases. >> Just so we're clear, we'd be talking about a new operation, right? Because today's increment returns the incremented value, and some uses (like generating unique values) do require that. >> >> joe >> >> > > -- Jeff Whiting Qualtrics Senior Software Engineer [EMAIL PROTECTED] +
Jeff Whiting 2011-06-20, 18:28
|