Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Key Value collision


Copy link to this message
-
Re: Key Value collision
Stack 2013-05-17, 06:08
On Thu, May 16, 2013 at 11:49 AM, Varun Sharma <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am wondering what happens when we add the following:
>
> row, col, timestamp --> v1
>
> A flush happens. Now, we add
>
> row, col, timestamp --> v2
>
> A flush happens again. In this case if MAX_VERSIONS == 1, how is the tie
> broken during reads and during minor compactions, is it arbitrary ?
>

See for what we have to say on
http://hbase.apache.org/book.html#versions I believe your question is
answered therein (Its what Michael says).

Here's a few tools in case you want to verify or interrogate its behavior
for yourself.

Start up a local instance.

Then start up a shell, create a table, insert a row then flush:

durruti:hbase-0.94.7 stack$ ./bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.7, r1471806, Wed Apr 24 18:48:26 PDT 2013

hbase(main):001:0> create 't', 'f'
2013-05-16 22:55:02.804 java[86479:1203] Unable to load realm info from
SCDynamicStore
0 row(s) in 5.5630 seconds

hbase(main):002:0> put 't', 'r', 'f:q', 'some value', 2
0 row(s) in 0.0800 seconds

hbase(main):003:0> flush 't'
0 row(s) in 5.4300 seconds
Check you have a flushed file w/ the expected content (My data in is in
default /tmp/hbase-USER dir):
durruti:hbase-0.94.7 stack$ ./bin/hbase
org.apache.hadoop.hbase.io.hfile.HFile --printkv -f
/tmp/hbase-stack/hbase/t/9820e76663df9e62807ecd88ed8e8588/f/8e7b62f748dc46aca8fb57f6fb153d90
13/05/16 22:57:02 INFO util.ChecksumType: Checksum can use
java.util.zip.CRC32
2013-05-16 22:57:02.467 java[86593:1203] Unable to load realm info from
SCDynamicStore
13/05/16 22:57:02 INFO hfile.CacheConfig: Allocating LruBlockCache with
maximum size 246.9m
13/05/16 22:57:02 ERROR metrics.SchemaMetrics: Inconsistent configuration.
Previous configuration for using table name in metrics: true, new
configuration: false
K: r/f:q/2/Put/vlen=10/ts=0 V: some value
Scanned kv count -> 1
Back to the shell, do another insert and flush w/ different value at same
timestamp of '2':
hbase(main):004:0> put 't', 'r', 'f:q', 'more recently written value', 2
0 row(s) in 0.0050 seconds

# Throw in a flush if you want...

hbase(main):011:0> get 't', 'r', {COLUMN => 'f:q', VERSIONS => 1}
COLUMN                                               CELL
 f:q                                                 timestamp=2,
value=more recently written value
1 row(s) in 0.0180 seconds
St.Ack