Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Key Value collision

Varun Sharma 2013-05-16, 18:49
Michael Segel 2013-05-16, 19:00
Jeff Kolesky 2013-05-16, 20:05
Michael Segel 2013-05-16, 20:50
Copy link to this message
Re: Key Value collision
On Thu, May 16, 2013 at 11:49 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Hi,
> I am wondering what happens when we add the following:
> row, col, timestamp --> v1
> A flush happens. Now, we add
> row, col, timestamp --> v2
> A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie
> broken during reads and during minor compactions, is it arbitrary ?

See for what we have to say on
http://hbase.apache.org/book.html#versions I believe your question is
answered therein (Its what Michael says).

Here's a few tools in case you want to verify or interrogate its behavior
for yourself.

Start up a local instance.

Then start up a shell, create a table, insert a row then flush:

durruti:hbase-0.94.7 stack$ ./bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.7, r1471806, Wed Apr 24 18:48:26 PDT 2013

hbase(main):001:0> create 't', 'f'
2013-05-16 22:55:02.804 java[86479:1203] Unable to load realm info from
0 row(s) in 5.5630 seconds

hbase(main):002:0> put 't', 'r', 'f:q', 'some value', 2
0 row(s) in 0.0800 seconds

hbase(main):003:0> flush 't'
0 row(s) in 5.4300 seconds
Check you have a flushed file w/ the expected content (My data in is in
default /tmp/hbase-USER dir):
durruti:hbase-0.94.7 stack$ ./bin/hbase
org.apache.hadoop.hbase.io.hfile.HFile --printkv -f
13/05/16 22:57:02 INFO util.ChecksumType: Checksum can use
2013-05-16 22:57:02.467 java[86593:1203] Unable to load realm info from
13/05/16 22:57:02 INFO hfile.CacheConfig: Allocating LruBlockCache with
maximum size 246.9m
13/05/16 22:57:02 ERROR metrics.SchemaMetrics: Inconsistent configuration.
Previous configuration for using table name in metrics: true, new
configuration: false
K: r/f:q/2/Put/vlen=10/ts=0 V: some value
Scanned kv count -> 1
Back to the shell, do another insert and flush w/ different value at same
timestamp of '2':
hbase(main):004:0> put 't', 'r', 'f:q', 'more recently written value', 2
0 row(s) in 0.0050 seconds

# Throw in a flush if you want...

hbase(main):011:0> get 't', 'r', {COLUMN => 'f:q', VERSIONS => 1}
COLUMN                                               CELL
 f:q                                                 timestamp=2,
value=more recently written value
1 row(s) in 0.0180 seconds