|
Tatsuya Kawano
2010-04-28, 14:40
Stack
2010-04-28, 16:42
Ryan Rawson
2010-04-29, 04:41
Tatsuya Kawano
2010-04-29, 08:33
Todd Lipcon
2010-04-29, 16:36
Guilherme Germoglio
2010-04-29, 16:58
Michael Segel
2010-04-29, 20:09
Tatsuya Kawano
2010-04-30, 16:31
Tatsuya Kawano
2010-05-08, 23:21
|
-
Unique row ID constraintTatsuya Kawano 2010-04-28, 14:40
Hi,
I'd like to implement unique row ID constraint (like the primary key constraint in RDBMS) in my application framework. Here is a code fragment from my current implementation (HBase 0.20.4rc) written in Scala. It works as expected, but is there any better (shorter) way to do this like checkAndPut()? I'd like to pass a single Put object to my function (method) rather than passing rowId, family, qualifier and value separately. I can't do this now because I have to give the rowLock object when I instantiate the Put. ==============================================def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], qualifier: Array[Byte], value: Array[Byte]): Unit = { val get = new Get(rowId) val lock = table.lockRow(rowId) // will expire in one minute try { if (table.exists(get)) { throw new DuplicateRowException("Tried to insert a duplicate row: " + Bytes.toString(rowId)) } else { val put = new Put(rowId, lock) put.add(family, qualifier, value) table.put(put) } } finally { table.unlockRow(lock) } } ============================================== Thanks, -- 河野 達也 Tatsuya Kawano (Mr.) Tokyo, Japan twitter: http://twitter.com/tatsuya6502
-
Re: Unique row ID constraintStack 2010-04-28, 16:42
Would the incrementValue [1] work for this?
St.Ack 1. http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano <[EMAIL PROTECTED]> wrote: > Hi, > > I'd like to implement unique row ID constraint (like the primary key > constraint in RDBMS) in my application framework. > > Here is a code fragment from my current implementation (HBase > 0.20.4rc) written in Scala. It works as expected, but is there any > better (shorter) way to do this like checkAndPut()? I'd like to pass > a single Put object to my function (method) rather than passing rowId, > family, qualifier and value separately. I can't do this now because I > have to give the rowLock object when I instantiate the Put. > > ==============================================> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], > �� qualifier: Array[Byte], value: > Array[Byte]): Unit = { > > val get = new Get(rowId) > > val lock = table.lockRow(rowId) // will expire in one minute > try { > if (table.exists(get)) { > throw new DuplicateRowException("Tried to insert a duplicate row: " > + Bytes.toString(rowId)) > > } else { > val put = new Put(rowId, lock) > put.add(family, qualifier, value) > > table.put(put) > } > > } finally { > table.unlockRow(lock) > } > > } > ==============================================> > Thanks, > > -- > 河野 達也 > Tatsuya Kawano (Mr.) > Tokyo, Japan > > twitter: http://twitter.com/tatsuya6502 >
-
Re: Unique row ID constraintRyan Rawson 2010-04-29, 04:41
I would strongly discourage people from building on top of
lockRow/unlockRow. The problem is if a row is not available, lockRow will hold a responder thread and you can end up with a deadlock because the lock holder won't be able to unlock. Sure the expiry system kicks in, but 60 seconds is kind of infinity in database terms :-) I would probably go with either ICV or CAS to build the tools you want. With CAS you can accomplish a lot of things locking accomplishes, but more efficiently. On Wed, Apr 28, 2010 at 9:42 AM, Stack <[EMAIL PROTECTED]> wrote: > Would the incrementValue [1] work for this? > St.Ack > > 1. http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 > > On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano > <[EMAIL PROTECTED]> wrote: >> Hi, >> >> I'd like to implement unique row ID constraint (like the primary key >> constraint in RDBMS) in my application framework. >> >> Here is a code fragment from my current implementation (HBase >> 0.20.4rc) written in Scala. It works as expected, but is there any >> better (shorter) way to do this like checkAndPut()? I'd like to pass >> a single Put object to my function (method) rather than passing rowId, >> family, qualifier and value separately. I can't do this now because I >> have to give the rowLock object when I instantiate the Put. >> >> ==============================================>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], >> qualifier: Array[Byte], value: >> Array[Byte]): Unit = { >> >> val get = new Get(rowId) >> >> val lock = table.lockRow(rowId) // will expire in one minute >> try { >> if (table.exists(get)) { >> throw new DuplicateRowException("Tried to insert a duplicate row: " >> + Bytes.toString(rowId)) >> >> } else { >> val put = new Put(rowId, lock) >> put.add(family, qualifier, value) >> >> table.put(put) >> } >> >> } finally { >> table.unlockRow(lock) >> } >> >> } >> ==============================================>> >> Thanks, >> >> -- >> 河野 達也 >> Tatsuya Kawano (Mr.) >> Tokyo, Japan >> >> twitter: http://twitter.com/tatsuya6502 >> >
-
Re: Unique row ID constraintTatsuya Kawano 2010-04-29, 08:33
Hi Stack and Ryan,
Thanks for your advices. I knew using row lock wasn't ideal, but I couldn't find an appropriate atomic operation to do Compare And Swap. So, thanks Stack for helping me to find it. I found incrementColumnValue() atomic operation just works for me since it automatically initializes the column value with 0 when the column doesn't exist. I cat try to increment the column value by 1, and if it returns 1, I can be sure that I'm the first one who has created the column and row. So, my updated code is much simpler and now lock-free. ============================================== def insert(table: HTable, put: Put): Unit = { val count = table.incrementColumnValue(put.getRow, family, uniqueQual, 1) if (count == 1) { table.put(put) } else { throw new DuplicateRowException("Tried to insert a duplicate row: " + Bytes.toString(put.getRow)) } } ============================================== Thanks, Tatsuya 2010/4/29 Ryan Rawson <[EMAIL PROTECTED]>: > I would strongly discourage people from building on top of > lockRow/unlockRow. The problem is if a row is not available, lockRow > will hold a responder thread and you can end up with a deadlock > because the lock holder won't be able to unlock. Sure the expiry > system kicks in, but 60 seconds is kind of infinity in database terms > :-) > > I would probably go with either ICV or CAS to build the tools you > want. With CAS you can accomplish a lot of things locking > accomplishes, but more efficiently. > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <[EMAIL PROTECTED]> wrote: >> Would the incrementValue [1] work for this? >> St.Ack >> >> 1. http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 >> >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano >> <[EMAIL PROTECTED]> wrote: >>> Hi, >>> >>> I'd like to implement unique row ID constraint (like the primary key >>> constraint in RDBMS) in my application framework. >>> >>> Here is a code fragment from my current implementation (HBase >>> 0.20.4rc) written in Scala. It works as expected, but is there any >>> better (shorter) way to do this like checkAndPut()? I'd like to pass >>> a single Put object to my function (method) rather than passing rowId, >>> family, qualifier and value separately. I can't do this now because I >>> have to give the rowLock object when I instantiate the Put. >>> >>> ==============================================>>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], >>> qualifier: Array[Byte], value: >>> Array[Byte]): Unit = { >>> >>> val get = new Get(rowId) >>> >>> val lock = table.lockRow(rowId) // will expire in one minute >>> try { >>> if (table.exists(get)) { >>> throw new DuplicateRowException("Tried to insert a duplicate row: " >>> + Bytes.toString(rowId)) >>> >>> } else { >>> val put = new Put(rowId, lock) >>> put.add(family, qualifier, value) >>> >>> table.put(put) >>> } >>> >>> } finally { >>> table.unlockRow(lock) >>> } >>> >>> } >>> ==============================================>>> >>> Thanks, >>> >>> -- >>> 河野 達也 >>> Tatsuya Kawano (Mr.) >>> Tokyo, Japan >>> >>> twitter: http://twitter.com/tatsuya6502
-
Re: Unique row ID constraintTodd Lipcon 2010-04-29, 16:36
Hi Tatsuya,
Note that your solution is not correct in the case of failure, since the check and put are not atomic with each other. If your client or server fails between the ICV and the put, no other clients will be able to put the row, but there will be no data. -Todd On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <[EMAIL PROTECTED]>wrote: > Hi Stack and Ryan, > > Thanks for your advices. I knew using row lock wasn't ideal, but I > couldn't find an appropriate atomic operation to do Compare And Swap. > > So, thanks Stack for helping me to find it. I found > incrementColumnValue() atomic operation just works for me since it > automatically initializes the column value with 0 when the column > doesn't exist. I cat try to increment the column value by 1, and if it > returns 1, I can be sure that I'm the first one who has created the > column and row. > > So, my updated code is much simpler and now lock-free. > > ==============================================> def insert(table: HTable, put: Put): Unit = { > val count = table.incrementColumnValue(put.getRow, family, uniqueQual, > 1) > > if (count == 1) { > table.put(put) > > } else { > throw new DuplicateRowException("Tried to insert a duplicate row: " > + Bytes.toString(put.getRow)) > } > } > ==============================================> > Thanks, > Tatsuya > > > > 2010/4/29 Ryan Rawson <[EMAIL PROTECTED]>: > > I would strongly discourage people from building on top of > > lockRow/unlockRow. The problem is if a row is not available, lockRow > > will hold a responder thread and you can end up with a deadlock > > because the lock holder won't be able to unlock. Sure the expiry > > system kicks in, but 60 seconds is kind of infinity in database terms > > :-) > > > > I would probably go with either ICV or CAS to build the tools you > > want. With CAS you can accomplish a lot of things locking > > accomplishes, but more efficiently. > > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <[EMAIL PROTECTED]> wrote: > >> Would the incrementValue [1] work for this? > >> St.Ack > >> > >> 1. > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 > >> > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano > >> <[EMAIL PROTECTED]> wrote: > >>> Hi, > >>> > >>> I'd like to implement unique row ID constraint (like the primary key > >>> constraint in RDBMS) in my application framework. > >>> > >>> Here is a code fragment from my current implementation (HBase > >>> 0.20.4rc) written in Scala. It works as expected, but is there any > >>> better (shorter) way to do this like checkAndPut()? I'd like to pass > >>> a single Put object to my function (method) rather than passing rowId, > >>> family, qualifier and value separately. I can't do this now because I > >>> have to give the rowLock object when I instantiate the Put. > >>> > >>> ==============================================> >>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], > >>> qualifier: Array[Byte], value: > >>> Array[Byte]): Unit = { > >>> > >>> val get = new Get(rowId) > >>> > >>> val lock = table.lockRow(rowId) // will expire in one minute > >>> try { > >>> if (table.exists(get)) { > >>> throw new DuplicateRowException("Tried to insert a duplicate > row: " > >>> + Bytes.toString(rowId)) > >>> > >>> } else { > >>> val put = new Put(rowId, lock) > >>> put.add(family, qualifier, value) > >>> > >>> table.put(put) > >>> } > >>> > >>> } finally { > >>> table.unlockRow(lock) > >>> } > >>> > >>> } > >>> ==============================================> >>> > >>> Thanks, > >>> > >>> -- > >>> 河野 達也 > >>> Tatsuya Kawano (Mr.) > >>> Tokyo, Japan > >>> > >>> twitter: http://twitter.com/tatsuya6502 > -- Todd Lipcon Software Engineer, Cloudera
-
Re: Unique row ID constraintGuilherme Germoglio 2010-04-29, 16:58
Hello Tatsuya,
Can the keys be randomly generated or they must be incremental? Remember that you can achieve higher throughput if they are randomly generated since the insertions will possibly load all machines more evenly. Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) and load balance over the cluster, but if you are paranoid enough you can also check whether a row already exists by using checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for an empty byte array values in a column that you can ensure it has always some value). On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > Hi Tatsuya, > > Note that your solution is not correct in the case of failure, since the > check and put are not atomic with each other. > > If your client or server fails between the ICV and the put, no other > clients > will be able to put the row, but there will be no data. > > -Todd > > > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <[EMAIL PROTECTED] > >wrote: > > > Hi Stack and Ryan, > > > > Thanks for your advices. I knew using row lock wasn't ideal, but I > > couldn't find an appropriate atomic operation to do Compare And Swap. > > > > So, thanks Stack for helping me to find it. I found > > incrementColumnValue() atomic operation just works for me since it > > automatically initializes the column value with 0 when the column > > doesn't exist. I cat try to increment the column value by 1, and if it > > returns 1, I can be sure that I'm the first one who has created the > > column and row. > > > > So, my updated code is much simpler and now lock-free. > > > > ==============================================> > def insert(table: HTable, put: Put): Unit = { > > val count = table.incrementColumnValue(put.getRow, family, uniqueQual, > > 1) > > > > if (count == 1) { > > table.put(put) > > > > } else { > > throw new DuplicateRowException("Tried to insert a duplicate row: " > > + Bytes.toString(put.getRow)) > > } > > } > > ==============================================> > > > Thanks, > > Tatsuya > > > > > > > > 2010/4/29 Ryan Rawson <[EMAIL PROTECTED]>: > > > I would strongly discourage people from building on top of > > > lockRow/unlockRow. The problem is if a row is not available, lockRow > > > will hold a responder thread and you can end up with a deadlock > > > because the lock holder won't be able to unlock. Sure the expiry > > > system kicks in, but 60 seconds is kind of infinity in database terms > > > :-) > > > > > > I would probably go with either ICV or CAS to build the tools you > > > want. With CAS you can accomplish a lot of things locking > > > accomplishes, but more efficiently. > > > > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <[EMAIL PROTECTED]> wrote: > > >> Would the incrementValue [1] work for this? > > >> St.Ack > > >> > > >> 1. > > > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 > > >> > > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano > > >> <[EMAIL PROTECTED]> wrote: > > >>> Hi, > > >>> > > >>> I'd like to implement unique row ID constraint (like the primary key > > >>> constraint in RDBMS) in my application framework. > > >>> > > >>> Here is a code fragment from my current implementation (HBase > > >>> 0.20.4rc) written in Scala. It works as expected, but is there any > > >>> better (shorter) way to do this like checkAndPut()? I'd like to pass > > >>> a single Put object to my function (method) rather than passing > rowId, > > >>> family, qualifier and value separately. I can't do this now because I > > >>> have to give the rowLock object when I instantiate the Put. > > >>> > > >>> ==============================================> > >>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte], Guilherme msn: [EMAIL PROTECTED] homepage: http://sites.google.com/site/germoglio/
-
RE: Unique row ID constraintMichael Segel 2010-04-29, 20:09
UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash of the UUID. > From: [EMAIL PROTECTED] > Date: Thu, 29 Apr 2010 13:58:42 -0300 > Subject: Re: Unique row ID constraint > To: [EMAIL PROTECTED] > > Hello Tatsuya, > > Can the keys be randomly generated or they must be incremental? Remember > that you can achieve higher throughput if they are randomly generated since > the insertions will possibly load all machines more evenly. > > Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) > and load balance over the cluster, but if you are paranoid enough you can > also check whether a row already exists by using > checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], > byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for > an empty byte array values in a column that you can ensure it has always > some value). > > On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: > > > Hi Tatsuya, > > > > Note that your solution is not correct in the case of failure, since the > > check and put are not atomic with each other. > > > > If your client or server fails between the ICV and the put, no other > > clients > > will be able to put the row, but there will be no data. > > > > -Todd > > > > > > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <[EMAIL PROTECTED] > > >wrote: > > > > > Hi Stack and Ryan, > > > > > > Thanks for your advices. I knew using row lock wasn't ideal, but I > > > couldn't find an appropriate atomic operation to do Compare And Swap. > > > > > > So, thanks Stack for helping me to find it. I found > > > incrementColumnValue() atomic operation just works for me since it > > > automatically initializes the column value with 0 when the column > > > doesn't exist. I cat try to increment the column value by 1, and if it > > > returns 1, I can be sure that I'm the first one who has created the > > > column and row. > > > > > > So, my updated code is much simpler and now lock-free. > > > > > > ==============================================> > > def insert(table: HTable, put: Put): Unit = { > > > val count = table.incrementColumnValue(put.getRow, family, uniqueQual, > > > 1) > > > > > > if (count == 1) { > > > table.put(put) > > > > > > } else { > > > throw new DuplicateRowException("Tried to insert a duplicate row: " > > > + Bytes.toString(put.getRow)) > > > } > > > } > > > ==============================================> > > > > > Thanks, > > > Tatsuya > > > > > > > > > > > > 2010/4/29 Ryan Rawson <[EMAIL PROTECTED]>: > > > > I would strongly discourage people from building on top of > > > > lockRow/unlockRow. The problem is if a row is not available, lockRow > > > > will hold a responder thread and you can end up with a deadlock > > > > because the lock holder won't be able to unlock. Sure the expiry > > > > system kicks in, but 60 seconds is kind of infinity in database terms > > > > :-) > > > > > > > > I would probably go with either ICV or CAS to build the tools you > > > > want. With CAS you can accomplish a lot of things locking > > > > accomplishes, but more efficiently. > > > > > > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <[EMAIL PROTECTED]> wrote: > > > >> Would the incrementValue [1] work for this? > > > >> St.Ack > > > >> > > > >> 1. > > > > > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29 > > > >> > > > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano > > > >> <[EMAIL PROTECTED]> wrote: > > > >>> Hi, > > > >>> > > > >>> I'd like to implement unique row ID constraint (like the primary key > > > >>> constraint in RDBMS) in my application framework. > > > >>> > > > >>> Here is a code fragment from my current implementation (HBase > > > >>> 0.20.4rc) written in Scala. It works as expected, but is there any _________________________________________________________________ Hotmail is redefining busy with tools for the New Busy. Get more from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
-
Re: Unique row ID constraintTatsuya Kawano 2010-04-30, 16:31
Thanks all for your responses; they are very helpful.
4/30/2010 Todd Lipcon <[EMAIL PROTECTED]>: > Note that your solution is not correct in the case of failure, since the > check and put are not atomic with each other. > > If your client or server fails between the ICV and the put, no other clients > will be able to put the row, but there will be no data. I agree; my solution is a bit fragile. If I stick with this plan, I could try to delete the counter after the put fails. However, it seems the delete also won't work, because the possible cause of the put failure can be network disruption or region server problem, etc.) So, I'm going to have to leave some kind of failure log, so I can remove the reserved key later by hand. 4/30/2010 Guilherme Germoglio <[EMAIL PROTECTED]>: > Can the keys be randomly generated or they must be incremental? Remember > that you can achieve higher throughput if they are randomly generated since > the insertions will possibly load all machines more evenly. > > Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) > and load balance over the cluster, 4/30/2010 Michael Segel <[EMAIL PROTECTED]>: > UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash of the UUID. Thanks for the info. Well, for my case, I'd like to use a combination of the business data as the row key, so I can scan them. But, I'll keep UUID option for other cases. 4/30/2010 Guilherme Germoglio <[EMAIL PROTECTED]>: > but if you are paranoid enough you can > also check whether a row already exists by using > checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], > byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for > an empty byte array values in a column that you can ensure it has always > some value). So, checkAndPut() seems ideal for my case. I didn't realize I can use it to check whether a row already exists. I'll give it a try! Thanks, Tatsuya -- 河野 達也 Tatsuya Kawano (Mr.) Tokyo, Japan twitter: http://twitter.com/tatsuya6502 2010年4月30日5:09 Michael Segel <[EMAIL PROTECTED]>: > > UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash of the UUID. > > >> From: [EMAIL PROTECTED] >> Date: Thu, 29 Apr 2010 13:58:42 -0300 >> Subject: Re: Unique row ID constraint >> To: [EMAIL PROTECTED] >> >> Hello Tatsuya, >> >> Can the keys be randomly generated or they must be incremental? Remember >> that you can achieve higher throughput if they are randomly generated since >> the insertions will possibly load all machines more evenly. >> >> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) >> and load balance over the cluster, but if you are paranoid enough you can >> also check whether a row already exists by using >> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], >> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for >> an empty byte array values in a column that you can ensure it has always >> some value). >> >> On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote: >> >> > Hi Tatsuya, >> > >> > Note that your solution is not correct in the case of failure, since the >> > check and put are not atomic with each other. >> > >> > If your client or server fails between the ICV and the put, no other >> > clients >> > will be able to put the row, but there will be no data. >> > >> > -Todd >> > >> > >> > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <[EMAIL PROTECTED] >> > >wrote: >> > >> > > Hi Stack and Ryan, >> > > >> > > Thanks for your advices. I knew using row lock wasn't ideal, but I >> > > couldn't find an appropriate atomic operation to do Compare And Swap. >> > > >> > > So, thanks Stack for helping me to find it. I found >> > > incrementColumnValue() atomic operation just works for me since it
-
Re: Unique row ID constraintTatsuya Kawano 2010-05-08, 23:21
OK. I found HTable#checkAndPut() perfectly works for me.
Here is my final code (in Scala.) Thanks Bruno for writing the blog article relating this topic. That was very informative. Outerthought :: HBase row locks http://outerthought.org/blog/blog/380-OTC.html ============================================== lazy val UniqueIndexQualifier = "unq".getBytes lazy val AbsenceMarker = Array[Byte]() // Empty byte array lazy val ExistenceMarker = Array[Byte](0x01) def insert(table: HTable, put: Put): Unit = { put.add(Family, UniqueIndexQualifier, ExistenceMarker) val succeeded = table.checkAndPut(put.getRow, Family, UniqueIndexQualifier, AbsenceMarker, put) if (! succeeded) { throw new DuplicateRowException("Tried to insert a duplicate row: " + Bytes.toString(put.getRow)) } } def update(table: HTable, put: Put): Unit = { val succeeded = table.checkAndPut(put.getRow, Family, UniqueIndexQualifier, ExistenceMarker, put) if (! succeeded) { throw new RowNotFoundException("Tried to update a non-existing row: " + Bytes.toString(put.getRow)) } } ============================================== Thanks, -- 河野 達也 Tatsuya Kawano (Mr.) Tokyo, Japan twitter: http://twitter.com/tatsuya6502 2010/5/1 Tatsuya Kawano <[EMAIL PROTECTED]>: > Thanks all for your responses; they are very helpful. > > 4/30/2010 Todd Lipcon <[EMAIL PROTECTED]>: >> Note that your solution is not correct in the case of failure, since the >> check and put are not atomic with each other. >> >> If your client or server fails between the ICV and the put, no other clients >> will be able to put the row, but there will be no data. > > I agree; my solution is a bit fragile. If I stick with this plan, I > could try to delete the counter after the put fails. However, it seems > the delete also won't work, because the possible cause of the put > failure can be network disruption or region server problem, etc.) So, > I'm going to have to leave some kind of failure log, so I can remove > the reserved key later by hand. > > > 4/30/2010 Guilherme Germoglio <[EMAIL PROTECTED]>: >> Can the keys be randomly generated or they must be incremental? Remember >> that you can achieve higher throughput if they are randomly generated since >> the insertions will possibly load all machines more evenly. >> >> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-) >> and load balance over the cluster, > > 4/30/2010 Michael Segel <[EMAIL PROTECTED]>: >> UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash of the UUID. > > Thanks for the info. Well, for my case, I'd like to use a combination > of the business data as the row key, so I can scan them. But, I'll > keep UUID option for other cases. > > > 4/30/2010 Guilherme Germoglio <[EMAIL PROTECTED]>: >> but if you are paranoid enough you can >> also check whether a row already exists by using >> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[], >> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for >> an empty byte array values in a column that you can ensure it has always >> some value). > > So, checkAndPut() seems ideal for my case. I didn't realize I can use > it to check whether a row already exists. I'll give it a try! > > > Thanks, > Tatsuya > > -- > 河野 達也 > Tatsuya Kawano (Mr.) > Tokyo, Japan > > twitter: http://twitter.com/tatsuya6502 > > > > > > > 2010年4月30日5:09 Michael Segel <[EMAIL PROTECTED]>: >> >> UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash of the UUID. >> >> >>> From: [EMAIL PROTECTED] >>> Date: Thu, 29 Apr 2010 13:58:42 -0300 >>> Subject: Re: Unique row ID constraint >>> To: [EMAIL PROTECTED] >>> >>> Hello Tatsuya, >>> >>> Can the keys be randomly generated or they must be incremental? Remember >>> that you can achieve higher throughput if they are randomly generated since |