Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> A general question on maxVersion handling when we have Secondary index tables


Copy link to this message
-
RE: A general question on maxVersion handling when we have Secondary index tables
Hi

Yes I was talking about the dead entry in the index table rather than the
actual data table.

Regards
Ram

> -----Original Message-----
> From: Wei Tan [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, August 28, 2012 9:22 PM
> To: [EMAIL PROTECTED]
> Cc: Sandeep Tata
> Subject: Re: A general question on maxVersion handling when we have
> Secondary index tables
>
> Thanks for sharing a pointer to your implementation.
> My two cents:
> timestamp is a way to do MVCC and setting every KV with the same TS
> will
> get concurrency control very tricky and error prone, if not impossible
> I think Ram is talking about the dead entry in the index table rather
> than
> data table. Deleting old index entries upfront when there is a new put
> might be a choice.
>
>
> Best Regards,
> Wei
>
> Wei Tan
> Research Staff Member
> IBM T. J. Watson Research Center
> 19 Skyline Dr, Hawthorne, NY  10532
> [EMAIL PROTECTED]; 914-784-6752
>
>
>
> From:   Jesse Yates <[EMAIL PROTECTED]>
> To:     [EMAIL PROTECTED],
> Date:   08/28/2012 04:00 AM
> Subject:        Re: A general question on maxVersion handling when we
> have
> Secondary index tables
>
>
>
> Ram,
>
> If I understand correctly, I think you can design your index such that
> you
> don't actually use the timestamp (e.g. everything gets put with a TS > 10
> -
> or some other non-special, relatively small number that's not 0 as I'd
> worry about that in HBase ;) Then when you set maxVersions to 1,
> everything
> should be good.
>
> You get a couple of wasted bytes from the TS, but with the prefixTrie
> stuff
> that should be pretty minimal overhead. If you do need to keep track of
> the
> timestamp you should be able to munge that back up into the column
> qualifier (and just know that that last 64 bits is the timestamp).
> Again a
> little more CPU cost, but its really not that big of an overhead. It
> seems
> like you don't really care about the TS though, in which case this
> should
> be pretty simple.
>
> Out of curiosity, what are people using for their secondary indexing
> solutions? I know there are a bunch out there, but don't know what
> people
> have adopted, what they like/dislike, design tradeoffs made and why.
>
> Disclaimer: I recently proposed a secondary indexing solution myself
> (shameless self-plug:
> http://jyates.github.com/2012/07/09/consistent-enough-secondary-
> indexes.html
> )
> and its something I'm working on for Salesforce - open sourced at some
> point, promise!
>
> -Jesse
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
>
>
> On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > Hi All
> >
> >
> >
> > When we try to build any type of secondary indices for a given table
> how
> > can
> > one handle maxVersions in the secondary index tables.
> >
> >
> >
> > For eg,
> >
> > I have inserted
> >
> >  Row1  -  Val1  => t
> >
> > Row1 - Val2 => t+1
> >
> > Row1 - Val3. => t+2
> >
> >
> >
> > Ideally if my max versions is only one then Val3 should be my result
> If
> I
> > query on main table for row1.
> >
> >
> >
> > Now in my index I will be having all the above 3 entries.  Now how
> can
> we
> > remove the older entries from the index table that does not fit into
> > maxVersions.
> >
> >
> >
> > Currently while scanning and the code that avoids the max Versions
> does
> not
> > give any hooks to know the entries skipped thro versions.
> >
> > So any suggestions on this, I am still seeing the code for any other
> > options
> > but suggestions welcome.
> >
> >
> >
> > Regards
> >
> > Ram
> >
> >