Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> HBase Design : Column name v/s Version


Copy link to this message
-
Re: HBase Design : Column name v/s Version
Theoretically that could work. However, it does seem like a weird way of doing what you want to do and you might run into unforeseen issues. One issue I see is that 100k versions sounds a bit scary. You can paginate through columns but not through versions on the same column for example.
 
Regards,

Dhaval
----- Original Message -----
From: Sagar Naik <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Dhaval Shah <[EMAIL PROTECTED]>
Cc:
Sent: Friday, 24 January 2014 1:46 PM
Subject: Re: HBase Design : Column name v/s Version

Thanks for clarifying,

I will be using custom version numbers (auto incrementing on the client
side) and not timestamps.
Two clients do not update the same row
-Sagar
On 1/24/14 10:33 AM, "Dhaval Shah" <[EMAIL PROTECTED]> wrote:

>I am talking about schema 2. Schema 1 would definitely work. Schema 2 can
>have the version collisions if you decide to use timestamps as versions
>
>Regards,
>
>Dhaval
>
>
>----- Original Message -----
>From: Sagar Naik <[EMAIL PROTECTED]>
>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; Dhaval Shah
><[EMAIL PROTECTED]>
>Cc:
>Sent: Friday, 24 January 2014 1:07 PM
>Subject: Re: HBase Design : Column name v/s Version
>
>I am not sure I understand you correctly.
>I assume you are talking abt schema 1.
>In this case I m appending the version number to the column name.
>
>The column_names are different (data_1/data_2) for value_1 and value_2
>respectively.
>
>
>-Sagar
>
>
>On 1/24/14 9:47 AM, "Dhaval Shah" <[EMAIL PROTECTED]> wrote:
>
>>Versions in HBase are timestamps by default. If you intend to continue
>>using the timestamps, what will happen when someone writes value_1 and
>>value_2 at the exact same time?
>>
>>Regards,
>>
>>Dhaval
>>
>>
>>----- Original Message -----
>>From: Sagar Naik <[EMAIL PROTECTED]>
>>To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
>>Cc:
>>Sent: Friday, 24 January 2014 12:27 PM
>>Subject: HBase Design : Column name v/s Version
>>
>>Hi,
>>
>>I have a choice to maintain to data either in column values or as
>>versioned data.
>>This data is not a versioned copy per se.
>>
>>The access pattern on this get all the data every time
>>
>>So the schema choices are :
>>Schema 1:
>>1. column_name/qualifier => data_1. column_value => value_1
>>1.a. column_name/qualifier => data_2. column_value => value_2,value_2.a
>>
>>1.b. column_name/qualifier => data_3. column_value => value_3
>>
>>To get all the values for "data", I will have to use ColumnPrefixFilter
>>with prefix set "data"
>>
>>Schema 2:
>>2. column_name/qualifier => data. version=> 1, column_value => value_1
>>
>>2.a. column_name/qualifier => data. version=> 2, column_value =>
>>value_2,value_2.a
>>
>>2.b. column_name/qualifier => data. version=> 3, column_value => value_3
>>To get all the values for "data" , I will do a simple get operation to
>>get
>>all the versions.
>>
>>Number of versions can go from: 10 to 100K
>>
>>Get operation perf should beat the Filter perf.
>>Comparing 100K values will be costly as the # versions increase.
>>
>>I would like to know if there are drawbacks in going the version route.
>>
>>
>>
>>
>>-Sagar
>>
>
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB