|
|
-
incrementColumnValue, WAL and timestamp
Stack 2011-01-12, 19:53
This is how incrementColumnValue looks currently. I have a few questions on it (I'm trying to add a sequence number to KeyValue -- hbase-2856).
public long incrementColumnValue(byte [] row, byte [] family, byte [] qualifier, long amount, boolean writeToWAL) throws IOException { ... try { ... // Get the old value: Get get = new Get(row); get.addColumn(family, qualifier);
.. // build the KeyValue now: KeyValue newKv = new KeyValue(row, family, qualifier, EnvironmentEdgeManager.currentTimeMillis(), Bytes.toBytes(result));
// Now log it: if (writeToWAL) { long now = EnvironmentEdgeManager.currentTimeMillis(); WALEdit walEdit = new WALEdit(); walEdit.add(newKv); this.log.append(regionInfo, regionInfo.getTableDesc().getName(), walEdit, now); }
// Now request the ICV to the store, this will set the timestamp // appropriately depending on if there is a value in memcache or not. // returns the change in the size of the memstore from operation long size = store.updateColumnValue(row, family, qualifier, result);
.... } finally { releaseRowLock(lid); } } finally { closeRegionOperation(); }
....
return result; }
Ignoring stuff like the double EnvironmentEdgeManager.currentTimeMillis() call -- i'll fix that -- the thing that is of interest to me is this allowance that what is written to WAL is not necessarily the edit that makes it out to the persisted store file (because the timestamp for the ICV is set later up in the memstore and the timestamp may not be 'now' but rather will be the largest timestamp currently in memstore).
This seems off to me. What do you lot see as repercussions of my writing into memstore the newKv we made above? Are increments going to come in out of order, is that what we're trying to protect against? Or are we trying to protect against some errant edit that has an inflated timestamp up in memstore?
Thanks, St.Ack
-
RE: incrementColumnValue, WAL and timestamp
Jonathan Gray 2011-01-12, 20:02
I think there are some hacky fixes in there to prevent duplicate timestamps.
With the new seqid, this should not be an issue. I think we should then be able to generate a single now() value and use it everywhere (MemStore, HLog). Duplicate timestamps won't be a problem as the seqid ordering will take care of it and we can still use the same logic to cleanup old values.
A separate issue is how "atomic" we want counters to be. Currently the multi-column Increment operation does not utilize any of the RWCC stuff. I guess here we should just use seqid now? Is it one seqid per KV or per "row transaction"?
JG
> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Stack > Sent: Wednesday, January 12, 2011 11:54 AM > To: HBase Dev List > Subject: incrementColumnValue, WAL and timestamp > > This is how incrementColumnValue looks currently. I have a few questions > on it (I'm trying to add a sequence number to KeyValue -- hbase-2856). > > public long incrementColumnValue(byte [] row, byte [] family, > byte [] qualifier, long amount, boolean writeToWAL) > throws IOException { > ... > try { > ... > // Get the old value: > Get get = new Get(row); > get.addColumn(family, qualifier); > > .. > // build the KeyValue now: > KeyValue newKv = new KeyValue(row, family, > qualifier, EnvironmentEdgeManager.currentTimeMillis(), > Bytes.toBytes(result)); > > // Now log it: > if (writeToWAL) { > long now = EnvironmentEdgeManager.currentTimeMillis(); > WALEdit walEdit = new WALEdit(); > walEdit.add(newKv); > this.log.append(regionInfo, regionInfo.getTableDesc().getName(), > walEdit, now); > } > > // Now request the ICV to the store, this will set the timestamp > // appropriately depending on if there is a value in memcache or not. > // returns the change in the size of the memstore from operation > long size = store.updateColumnValue(row, family, qualifier, result); > > .... > } finally { > releaseRowLock(lid); > } > } finally { > closeRegionOperation(); > } > > .... > > return result; > } > > Ignoring stuff like the double > EnvironmentEdgeManager.currentTimeMillis() call -- i'll fix that -- the thing > that is of interest to me is this allowance that what is written to WAL is not > necessarily the edit that makes it out to the persisted store file (because the > timestamp for the ICV is set later up in the memstore and the timestamp > may not be 'now' but rather will be the largest timestamp currently in > memstore). > > This seems off to me. What do you lot see as repercussions of my writing > into memstore the newKv we made above? Are increments going to come > in out of order, is that what we're trying to protect against? > Or are we trying to protect against some errant edit that has an inflated > timestamp up in memstore? > > Thanks, > St.Ack
-
Re: incrementColumnValue, WAL and timestamp
Stack 2011-01-12, 20:14
On Wed, Jan 12, 2011 at 12:02 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > I think there are some hacky fixes in there to prevent duplicate timestamps. >
Ok.
> With the new seqid, this should not be an issue. I think we should then be able to generate a single now() value and use it everywhere (MemStore, HLog). Duplicate timestamps won't be a problem as the seqid ordering will take care of it and we can still use the same logic to cleanup old values. >
Right.
OK if I strip back all I regard as 'workaround' code as part of my first cut at 2856?
> A separate issue is how "atomic" we want counters to be. Currently the multi-column Increment operation does not utilize any of the RWCC stuff. I guess here we should just use seqid now? Is it one seqid per KV or per "row transaction"? >
I chatted w/ Ryan on this yesterday. For single increment, it uses RWCC. We'd need to change the ordering of RWCC updates a little so that the increment was immediately available (previous we had a memstoreTs set to 0 as a means for doing this -- this goes away now). I'll have to look at the multi-increment. Sounds like it should go under RWCC controls.
Currently planning one sequence number per batch of edits, per write to WAL.
St.Ack
-
Re: incrementColumnValue, WAL and timestamp
Ryan Rawson 2011-01-12, 20:32
That code was written to handle the pathological case where the same timestamp would be in memstore and hfile and we'd get a bad read and base counter updates on the wrong value. That is we'd have 2 KVs: KV1 ts=1 value=1500 KV2 ts=1 value=500
and we'd base the increment on KV2, thus 'losing' 1000 increments. With seqid we can determine that KV1 > KV2 even though ts=1 for both and resolve without doing the Wrong Thing. That we don't use the same KV is because you don't want to mutate a KV that someone else in a different thread (log flushing) might depend on. Unfortunately updating timestamp which is a long isnt atomic in KV, since you are updating 8 byte fields one at a time.
Increment has always been atomic, and making multi increment atomic will be a little trickier, the pseudo code is: - beginMemstoreInsert(seqid); - store.memstore.add(kv); - commitMemstoreInsert(seqid); - store.memstore.remove_versions(kv);
the last being necessary to reduce version bloat. On Wed, Jan 12, 2011 at 12:14 PM, Stack <[EMAIL PROTECTED]> wrote: > On Wed, Jan 12, 2011 at 12:02 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> I think there are some hacky fixes in there to prevent duplicate timestamps. >> > > Ok. > >> With the new seqid, this should not be an issue. I think we should then be able to generate a single now() value and use it everywhere (MemStore, HLog). Duplicate timestamps won't be a problem as the seqid ordering will take care of it and we can still use the same logic to cleanup old values. >> > > Right. > > OK if I strip back all I regard as 'workaround' code as part of my > first cut at 2856? > >> A separate issue is how "atomic" we want counters to be. Currently the multi-column Increment operation does not utilize any of the RWCC stuff. I guess here we should just use seqid now? Is it one seqid per KV or per "row transaction"? >> > > I chatted w/ Ryan on this yesterday. For single increment, it uses > RWCC. We'd need to change the ordering of RWCC updates a little so > that the increment was immediately available (previous we had a > memstoreTs set to 0 as a means for doing this -- this goes away now). > I'll have to look at the multi-increment. Sounds like it should go > under RWCC controls. > > Currently planning one sequence number per batch of edits, per write to WAL. > > St.Ack >
|
|