|
Ferdy
2010-03-08, 12:53
Erik Holstad
2010-03-08, 15:09
Ferdy
2010-03-08, 15:48
Erik Holstad
2010-03-08, 15:58
Ferdy
2010-03-08, 16:45
Erik Holstad
2010-03-08, 16:57
Jonathan Gray
2010-03-08, 17:08
Ferdy
2010-03-09, 10:34
|
-
Best way to do a clean update of a rowFerdy 2010-03-08, 12:53
Hi,
Sometimes we wish to do a clean update of a row, that is: Make sure any old column values are removed that are not in the new Put. This is how we're doing this now (hbase 0.20.3): //delRow and putRow are the same row, //but the row may currently contains columns that are not redefined in putRow HTable htable = new HTable("tablename"); htable.delete(delRow); htable.put(putRow); We just call these sequentially (single-threaded). However, could it be possible that the delete is issued somehow AFTER the put? The htable object has default settings (in other words there is no ). The reason why I'm asking is that we are probably experiencing missing row issues. If so, is there a better way to do an update of a row and discarding old column values? Regards, Ferdy
-
Re: Best way to do a clean update of a rowErik Holstad 2010-03-08, 15:09
Hey Ferdy!
There has been a lot of talk about this lately. HBase has a resolution of milli seconds so if you do a put and a get in the same milli the put will not be shown. There are a couple of solutions to this problem. Waiting one milli second with the put, setting the timestamps yourself or doing some kinda of swap between two rows. Erik
-
Re: Best way to do a clean update of a rowFerdy 2010-03-08, 15:48
Hey Erik,
Thanks for replying. Do you mean a delete and a put in the same milli? Otherwise I don't think I fully understand what your saying.. Ferdy. Erik Holstad wrote: > Hey Ferdy! > There has been a lot of talk about this lately. HBase has a resolution of > milli seconds so > if you do a put and a get in the same milli the put will not be shown. > There are a couple of solutions to this problem. Waiting one milli second > with the put, > setting the timestamps yourself or doing some kinda of swap between two > rows. > > Erik > >
-
Re: Best way to do a clean update of a rowErik Holstad 2010-03-08, 15:58
Hey Ferdy!
Not really sure what you are asking now. But if you do a deleteRow and then a put in the same milli second the put will be "shadowed" by the delete so that it will not show up when you look for later, if that makes sense? The reason for this is that deletes are sorted before puts for the same timestamp, so for a put to be viewable it needs to have a newer timestamp than the delete. -- Regards Erik
-
Re: Best way to do a clean update of a rowFerdy 2010-03-08, 16:45
Hey,
Great! That is exactly what I meant. So that implies that firing a Delete and a Put right after eachother is a pretty bad practise, if you want the Put to persist. Please note, I only need one version. (All my families are VERSIONS => '1') . I guess I have the following choice of solutions: // Solution A: Issue a client-side pause htable.delete(delete); try {Thread.sleep(10);} catch (InterruptedException e) {} htable.put(put); But wait, the javadoc for Delete states that if no timestamp is specified, the SERVER will use the 'now' time. This means that if the Delete and the Put can still be determined to have the same timestamp. // Solution B: specify timestamps long deleteTS = System.currentTimeMillis(); long putTS = deleteTS+1; Delete delete = new Delete(row, deleteTS null); htable.delete(delete); Put put = new Put(row); put.add(family, column, putTS, value); htable.put(put); How about this solution? I'm guessing the only disadvantage to this one is: A client machine with an incorrectly set systemtime (let's say a few days ahead) will not be able to be removed by another machine (with a correct systemtime) shortly after, because the deleteTS of the correct client will be smaller than the timestamp in the table. Regards, Ferdy Erik Holstad wrote: > Hey Ferdy! > Not really sure what you are asking now. But if you do a deleteRow and then > a put in the same > milli second the put will be "shadowed" by the delete so that it will not > show up when you look > for later, if that makes sense? The reason for this is that deletes are > sorted before puts for the > same timestamp, so for a put to be viewable it needs to have a newer > timestamp than the delete. > > >
-
Re: Best way to do a clean update of a rowErik Holstad 2010-03-08, 16:57
Hey Ferdy!
On Mon, Mar 8, 2010 at 8:45 AM, Ferdy <[EMAIL PROTECTED]> wrote: > Hey, > > Great! That is exactly what I meant. So that implies that firing a Delete > and a Put right after eachother is a pretty bad practise, if you want the > Put to persist. Please note, I only need one version. (All my families are > VERSIONS => '1') . > > I guess I have the following choice of solutions: > > // Solution A: Issue a client-side pause > htable.delete(delete); > try {Thread.sleep(10);} catch (InterruptedException e) {} > htable.put(put); > > But wait, the javadoc for Delete states that if no timestamp is specified, > the SERVER will use the 'now' time. This means that if the Delete and the > Put can still be determined to have the same timestamp. > Not, really sure why they would still get the same timestamp if you wait 10 millis on the client, should be the same resolution on the server, right? > > // Solution B: specify timestamps > long deleteTS = System.currentTimeMillis(); > long putTS = deleteTS+1; > Delete delete = new Delete(row, deleteTS null); > htable.delete(delete); > Put put = new Put(row); > put.add(family, column, putTS, value); > htable.put(put); > > How about this solution? I'm guessing the only disadvantage to this one is: > A client machine with an incorrectly set systemtime (let's say a few days > ahead) will not be able to be removed by another machine (with a correct > systemtime) shortly after, because the deleteTS of the correct client will > be smaller than the timestamp in the table. > This is the reason that it might be tricky to use your own client timestamp and makes server setting of timestamps a better option. But is seems like you have a good understanding of the consequences, so good luck! > > Regards, > Ferdy > > > Erik Holstad wrote: > >> Hey Ferdy! >> Not really sure what you are asking now. But if you do a deleteRow and >> then >> a put in the same >> milli second the put will be "shadowed" by the delete so that it will not >> show up when you look >> for later, if that makes sense? The reason for this is that deletes are >> sorted before puts for the >> same timestamp, so for a put to be viewable it needs to have a newer >> timestamp than the delete. >> >> >> >> > -- Regards Erik
-
RE: Best way to do a clean update of a rowJonathan Gray 2010-03-08, 17:08
Ferdy,
Another strategy might be to not issue the delete and just insert a new version on top of the old one. Whether this makes sense or not depends on whether the columns for that row change between versions. If it's always the same columns then you can just re-insert and when you grab the latest version you will only see the new one. If they change, you would need to follow one of your other strategies. I would probably not use solution A just because there's not really a need to introduce a client-side pause. I would opt for grabbing now() and incrementing the Put stamp by 1. This issue is currently under discussion and we'd really like to get this kind of unexpected (but understandable) behavior to be a little more user friendly so that if you put after a delete you would actually see it. There's no estimated time for it but until then you can try the workarounds. JG -----Original Message----- From: Erik Holstad [mailto:[EMAIL PROTECTED]] Sent: Monday, March 08, 2010 8:58 AM To: [EMAIL PROTECTED] Subject: Re: Best way to do a clean update of a row Hey Ferdy! On Mon, Mar 8, 2010 at 8:45 AM, Ferdy <[EMAIL PROTECTED]> wrote: > Hey, > > Great! That is exactly what I meant. So that implies that firing a Delete > and a Put right after eachother is a pretty bad practise, if you want the > Put to persist. Please note, I only need one version. (All my families are > VERSIONS => '1') . > > I guess I have the following choice of solutions: > > // Solution A: Issue a client-side pause > htable.delete(delete); > try {Thread.sleep(10);} catch (InterruptedException e) {} > htable.put(put); > > But wait, the javadoc for Delete states that if no timestamp is specified, > the SERVER will use the 'now' time. This means that if the Delete and the > Put can still be determined to have the same timestamp. > Not, really sure why they would still get the same timestamp if you wait 10 millis on the client, should be the same resolution on the server, right? > > // Solution B: specify timestamps > long deleteTS = System.currentTimeMillis(); > long putTS = deleteTS+1; > Delete delete = new Delete(row, deleteTS null); > htable.delete(delete); > Put put = new Put(row); > put.add(family, column, putTS, value); > htable.put(put); > > How about this solution? I'm guessing the only disadvantage to this one is: > A client machine with an incorrectly set systemtime (let's say a few days > ahead) will not be able to be removed by another machine (with a correct > systemtime) shortly after, because the deleteTS of the correct client will > be smaller than the timestamp in the table. > This is the reason that it might be tricky to use your own client timestamp and makes server setting of timestamps a better option. But is seems like you have a good understanding of the consequences, so good luck! > > Regards, > Ferdy > > > Erik Holstad wrote: > >> Hey Ferdy! >> Not really sure what you are asking now. But if you do a deleteRow and >> then >> a put in the same >> milli second the put will be "shadowed" by the delete so that it will not >> show up when you look >> for later, if that makes sense? The reason for this is that deletes are >> sorted before puts for the >> same timestamp, so for a put to be viewable it needs to have a newer >> timestamp than the delete. >> >> >> >> > -- Regards Erik
-
Re: Best way to do a clean update of a rowFerdy 2010-03-09, 10:34
Hey,
The column names indeed change between versions. For now I will adopt solution B, and accept the fact that in very rare cases old columns may not be deleted. (Which could happen when a client does a put with a clock ahead). Shouldn't occur very often since our systemtimes are pretty accurate and updates will not happen more frequent than once every hour. Ferdy Jonathan Gray wrote: > Ferdy, > > Another strategy might be to not issue the delete and just insert a new > version on top of the old one. > > Whether this makes sense or not depends on whether the columns for that row > change between versions. If it's always the same columns then you can just > re-insert and when you grab the latest version you will only see the new > one. If they change, you would need to follow one of your other strategies. > > I would probably not use solution A just because there's not really a need > to introduce a client-side pause. I would opt for grabbing now() and > incrementing the Put stamp by 1. > > This issue is currently under discussion and we'd really like to get this > kind of unexpected (but understandable) behavior to be a little more user > friendly so that if you put after a delete you would actually see it. > There's no estimated time for it but until then you can try the workarounds. > > JG > > -----Original Message----- > From: Erik Holstad [mailto:[EMAIL PROTECTED]] > Sent: Monday, March 08, 2010 8:58 AM > To: [EMAIL PROTECTED] > Subject: Re: Best way to do a clean update of a row > > Hey Ferdy! > > On Mon, Mar 8, 2010 at 8:45 AM, Ferdy <[EMAIL PROTECTED]> wrote: > > >> Hey, >> >> Great! That is exactly what I meant. So that implies that firing a Delete >> and a Put right after eachother is a pretty bad practise, if you want the >> Put to persist. Please note, I only need one version. (All my families are >> VERSIONS => '1') . >> >> I guess I have the following choice of solutions: >> >> // Solution A: Issue a client-side pause >> htable.delete(delete); >> try {Thread.sleep(10);} catch (InterruptedException e) {} >> htable.put(put); >> >> But wait, the javadoc for Delete states that if no timestamp is specified, >> the SERVER will use the 'now' time. This means that if the Delete and the >> Put can still be determined to have the same timestamp. >> >> > Not, really sure why they would still get the same timestamp if you wait 10 > millis on the client, should be the same resolution on the server, right? > > > >> // Solution B: specify timestamps >> long deleteTS = System.currentTimeMillis(); >> long putTS = deleteTS+1; >> Delete delete = new Delete(row, deleteTS null); >> htable.delete(delete); >> Put put = new Put(row); >> put.add(family, column, putTS, value); >> htable.put(put); >> >> How about this solution? I'm guessing the only disadvantage to this one >> > is: > >> A client machine with an incorrectly set systemtime (let's say a few days >> ahead) will not be able to be removed by another machine (with a correct >> systemtime) shortly after, because the deleteTS of the correct client will >> be smaller than the timestamp in the table. >> >> > This is the reason that it might be tricky to use your own client timestamp > and makes server setting of timestamps a better option. > > But is seems like you have a good understanding of the consequences, so good > luck! > > > >> Regards, >> Ferdy >> >> >> Erik Holstad wrote: >> >> >>> Hey Ferdy! >>> Not really sure what you are asking now. But if you do a deleteRow and >>> then >>> a put in the same >>> milli second the put will be "shadowed" by the delete so that it will not >>> show up when you look >>> for later, if that makes sense? The reason for this is that deletes are >>> sorted before puts for the >>> same timestamp, so for a put to be viewable it needs to have a newer >>> timestamp than the delete. >>> >>> >>> >>> >>> > > > |