|
Sami Omer
2012-10-16, 20:51
Eric Newton
2012-10-16, 22:19
David Medinets
2012-10-17, 01:03
Mike Drob
2012-10-17, 01:15
Christopher Tubbs
2012-10-17, 06:32
ameet kini
2012-10-17, 12:40
Sami Omer
2012-10-17, 15:56
|
-
Concurrent updatesSami Omer 2012-10-16, 20:51
Hello everyone,
I'm using Accumulo 1.3.6 as the backend for a project that I've been working on, I'm relatively new to it. I have written a client that appends some of the data stored in my Accumulo backend. Now, if I have multiple clients running and they perform the read/update operation simultaneously I might run into concurrency problems. So, I was wondering what could be done to prevent such race conditions. Does Accumulo have an equivalent to RDBMS's transactions? Or is there a way to lock rows that are currently being processed for read/update? Do you have any other ways to solve the issue of concurrent updates? Your help is greatly appreciated. Sami
-
Re: Concurrent updatesEric Newton 2012-10-16, 22:19
Accumulo does not have locks, nor does it have transactions. It does
support atomic, isolated updates within a row. Accumulo also supports a very specific kind of update which is very helpful in the case of sums and aggregates. For example, if I wanted to provide a "count" I can insert: row X, column A, value 1: (X, A, 1) to indicate some event occurred. Eventually, in the database, there will be lots of these values at a row/column: (X, A, 1) (X, A, 1) (X, A, 1) You can insert code to reduce these values when you scan, kind of like a Combiner in a map/reduce job. This code will emit: (X, A, 3) The same code can also be incorporated into the compaction scheme, so eventually, the database will actually store: (X, A, 3) This mechanism can be used to "sum" more complex information. The point is that you can take advantage of the log-structured merge tree to defer the computation to some point in the future when you have a very high degree of isolation. Yet you can still perform the computation as-needed right after you add insert information. Of course, it does not provide a substitute for locks or transactions and may not cover your use case. But it covers a surprising number of them. -Eric On Tue, Oct 16, 2012 at 4:51 PM, Sami Omer <[EMAIL PROTECTED]> wrote: > Hello everyone, > > > > I’m using Accumulo 1.3.6 as the backend for a project that I’ve been working > on, I’m relatively new to it. > > > > I have written a client that appends some of the data stored in my Accumulo > backend. Now, if I have multiple clients running and they perform the > read/update operation simultaneously I might run into concurrency problems. > So, I was wondering what could be done to prevent such race conditions. Does > Accumulo have an equivalent to RDBMS’s transactions? Or is there a way to > lock rows that are currently being processed for read/update? Do you have > any other ways to solve the issue of concurrent updates? > > > > Your help is greatly appreciated. > > > > Sami
-
Re: Concurrent updatesDavid Medinets 2012-10-17, 01:03
On Tue, Oct 16, 2012 at 6:19 PM, Eric Newton <[EMAIL PROTECTED]> wrote:
> Accumulo also supports a very specific kind of update which is very > helpful in the case of sums and aggregates. I'll ask the obvious follow-up question. If two kinds of sums are needed (i.e., summing by day and week), would an event need to be written to two tables?
-
Re: Concurrent updatesMike Drob 2012-10-17, 01:15
One option is to only set the aggregation for scan scope, and then you can
programatically choose which one you need. On Tue, Oct 16, 2012 at 9:03 PM, David Medinets <[EMAIL PROTECTED]>wrote: > On Tue, Oct 16, 2012 at 6:19 PM, Eric Newton <[EMAIL PROTECTED]> > wrote: > > Accumulo also supports a very specific kind of update which is very > > helpful in the case of sums and aggregates. > > I'll ask the obvious follow-up question. If two kinds of sums are > needed (i.e., summing by day and week), would an event need to be > written to two tables? >
-
Re: Concurrent updatesChristopher Tubbs 2012-10-17, 06:32
If you're going to only do aggregation at the scan scope, you'd
probably want to make sure you don't have the versioning iterator turned on for minc and majc scopes. Otherwise, you're scans may look different over time between initial ingest and later, when the data has been compacted on disk. -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Tue, Oct 16, 2012 at 9:15 PM, Mike Drob <[EMAIL PROTECTED]> wrote: > One option is to only set the aggregation for scan scope, and then you can > programatically choose which one you need. > > > On Tue, Oct 16, 2012 at 9:03 PM, David Medinets <[EMAIL PROTECTED]> > wrote: >> >> On Tue, Oct 16, 2012 at 6:19 PM, Eric Newton <[EMAIL PROTECTED]> >> wrote: >> > Accumulo also supports a very specific kind of update which is very >> > helpful in the case of sums and aggregates. >> >> I'll ask the obvious follow-up question. If two kinds of sums are >> needed (i.e., summing by day and week), would an event need to be >> written to two tables? > >
-
Re: Concurrent updatesameet kini 2012-10-17, 12:40
Alternately, for two rollups - summing by day and week - you could insert
into two different columns, with each configured with its own combiner. The advantage with this scheme is that both combiners can be configured for all three scopes - scan, minc, and majc, so the versioning iterator will not interfere. Ameet On Wed, Oct 17, 2012 at 2:32 AM, Christopher Tubbs <[EMAIL PROTECTED]>wrote: > If you're going to only do aggregation at the scan scope, you'd > probably want to make sure you don't have the versioning iterator > turned on for minc and majc scopes. Otherwise, you're scans may look > different over time between initial ingest and later, when the data > has been compacted on disk. > > -- > Christopher L Tubbs II > http://gravatar.com/ctubbsii > > > On Tue, Oct 16, 2012 at 9:15 PM, Mike Drob <[EMAIL PROTECTED]> wrote: > > One option is to only set the aggregation for scan scope, and then you > can > > programatically choose which one you need. > > > > > > On Tue, Oct 16, 2012 at 9:03 PM, David Medinets < > [EMAIL PROTECTED]> > > wrote: > >> > >> On Tue, Oct 16, 2012 at 6:19 PM, Eric Newton <[EMAIL PROTECTED]> > >> wrote: > >> > Accumulo also supports a very specific kind of update which is very > >> > helpful in the case of sums and aggregates. > >> > >> I'll ask the obvious follow-up question. If two kinds of sums are > >> needed (i.e., summing by day and week), would an event need to be > >> written to two tables? > > > > >
-
RE: Concurrent updatesSami Omer 2012-10-17, 15:56
Thanks for the detailed response Eric, much appreciated!
-----Original Message----- From: Eric Newton [mailto:[EMAIL PROTECTED]] Sent: Tuesday, October 16, 2012 6:19 PM To: [EMAIL PROTECTED] Subject: Re: Concurrent updates Accumulo does not have locks, nor does it have transactions. It does support atomic, isolated updates within a row. Accumulo also supports a very specific kind of update which is very helpful in the case of sums and aggregates. For example, if I wanted to provide a "count" I can insert: row X, column A, value 1: (X, A, 1) to indicate some event occurred. Eventually, in the database, there will be lots of these values at a row/column: (X, A, 1) (X, A, 1) (X, A, 1) You can insert code to reduce these values when you scan, kind of like a Combiner in a map/reduce job. This code will emit: (X, A, 3) The same code can also be incorporated into the compaction scheme, so eventually, the database will actually store: (X, A, 3) This mechanism can be used to "sum" more complex information. The point is that you can take advantage of the log-structured merge tree to defer the computation to some point in the future when you have a very high degree of isolation. Yet you can still perform the computation as-needed right after you add insert information. Of course, it does not provide a substitute for locks or transactions and may not cover your use case. But it covers a surprising number of them. -Eric On Tue, Oct 16, 2012 at 4:51 PM, Sami Omer <[EMAIL PROTECTED]> wrote: > Hello everyone, > > > > I'm using Accumulo 1.3.6 as the backend for a project that I've been > working on, I'm relatively new to it. > > > > I have written a client that appends some of the data stored in my > Accumulo backend. Now, if I have multiple clients running and they > perform the read/update operation simultaneously I might run into concurrency problems. > So, I was wondering what could be done to prevent such race > conditions. Does Accumulo have an equivalent to RDBMS's transactions? > Or is there a way to lock rows that are currently being processed for > read/update? Do you have any other ways to solve the issue of concurrent updates? > > > > Your help is greatly appreciated. > > > > Sami |