Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # dev >> Re: A general question on maxVersion handling when we have Secondary index tables


+
Jonathan Hsieh 2012-08-29, 13:47
+
Ted Yu 2012-08-29, 14:15
+
Ramkrishna.S.Vasudevan 2012-08-29, 15:12
+
Jonathan Hsieh 2012-08-29, 16:11
+
Ted Yu 2012-08-29, 16:19
+
Jesse Yates 2012-08-29, 17:03
+
Ted Yu 2012-08-29, 17:07
+
Jonathan Hsieh 2012-08-29, 18:18
+
Ramkrishna.S.Vasudevan 2012-08-30, 04:34
+
Jonathan Hsieh 2012-08-29, 17:46
+
Ramkrishna.S.Vasudevan 2012-08-30, 04:18
+
Jesse Yates 2012-08-28, 07:59
+
Ramkrishna.S.Vasudevan 2012-08-28, 08:51
Copy link to this message
-
Re: A general question on maxVersion handling when we have Secondary index tables
Thanks for sharing a pointer to your implementation.
My two cents:
timestamp is a way to do MVCC and setting every KV with the same TS will
get concurrency control very tricky and error prone, if not impossible
I think Ram is talking about the dead entry in the index table rather than
data table. Deleting old index entries upfront when there is a new put
might be a choice.
Best Regards,
Wei

Wei Tan
Research Staff Member
IBM T. J. Watson Research Center
19 Skyline Dr, Hawthorne, NY  10532
[EMAIL PROTECTED]; 914-784-6752

From:   Jesse Yates <[EMAIL PROTECTED]>
To:     [EMAIL PROTECTED],
Date:   08/28/2012 04:00 AM
Subject:        Re: A general question on maxVersion handling when we have
Secondary index tables

Ram,

If I understand correctly, I think you can design your index such that you
don't actually use the timestamp (e.g. everything gets put with a TS = 10
-
or some other non-special, relatively small number that's not 0 as I'd
worry about that in HBase ;) Then when you set maxVersions to 1,
everything
should be good.

You get a couple of wasted bytes from the TS, but with the prefixTrie
stuff
that should be pretty minimal overhead. If you do need to keep track of
the
timestamp you should be able to munge that back up into the column
qualifier (and just know that that last 64 bits is the timestamp). Again a
little more CPU cost, but its really not that big of an overhead. It seems
like you don't really care about the TS though, in which case this should
be pretty simple.

Out of curiosity, what are people using for their secondary indexing
solutions? I know there are a bunch out there, but don't know what people
have adopted, what they like/dislike, design tradeoffs made and why.

Disclaimer: I recently proposed a secondary indexing solution myself
(shameless self-plug:
http://jyates.github.com/2012/07/09/consistent-enough-secondary-indexes.html
)
and its something I'm working on for Salesforce - open sourced at some
point, promise!

-Jesse
-------------------
Jesse Yates
@jesse_yates
jyates.github.com
On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan <
[EMAIL PROTECTED]> wrote:

> Hi All
>
>
>
> When we try to build any type of secondary indices for a given table how
> can
> one handle maxVersions in the secondary index tables.
>
>
>
> For eg,
>
> I have inserted
>
>  Row1  -  Val1  => t
>
> Row1 - Val2 => t+1
>
> Row1 - Val3. => t+2
>
>
>
> Ideally if my max versions is only one then Val3 should be my result If
I
> query on main table for row1.
>
>
>
> Now in my index I will be having all the above 3 entries.  Now how can
we
> remove the older entries from the index table that does not fit into
> maxVersions.
>
>
>
> Currently while scanning and the code that avoids the max Versions does
not
> give any hooks to know the entries skipped thro versions.
>
> So any suggestions on this, I am still seeing the code for any other
> options
> but suggestions welcome.
>
>
>
> Regards
>
> Ram
>
>

+
Ramkrishna.S.Vasudevan 2012-08-29, 04:18
+
Ted Yu 2012-08-28, 16:03
+
Stack 2012-08-29, 22:32
+
Jesse Yates 2012-08-28, 17:03
+
Ted Yu 2012-08-28, 17:34
+
Ramkrishna.S.Vasudevan 2012-08-28, 07:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB