Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> A general question on maxVersion handling when we have Secondary index tables


Copy link to this message
-
RE: A general question on maxVersion handling when we have Secondary index tables
Yes Jon.  You got it right.  This is the problem.  But all the
implementation we need to have some type of mechanism where we go thro all
the rows in the sec index table.
Suppose in the below example if I say in my main table maxVersions is 5. So
I will scan the top 5 values from the sec index table and once I get the 6th
value I need to delete the first one from the sec index table.  This
involves some type of cache or map where I can keep incrementing the count
for every row that we get. And whenever I see I have value which is more
than maxVersions delete the oldest one.

We also thought of another option though it is slower
->Scan one row in Sec table.  
->Extract the actual row key of the main table and scan the main table using
that.  Here I will be getting only the required version entries.  
-> Now based on these entries delete the expired entries from the sec index
table.  
Thought of doing this in Compaction time.(Major).
But doing this has one problem like when ever we do compaction we deal with
direct store level scanners.  Even if we try to use the new hooks added by
Lars H preCompactScannerOpen(),
This scanner always expects the kvs to be ordered.  But we may not be able
to get them in order if we try the way mentioned here.

We also felt that if we have a hook while filtering out the expired KVs may
be we can try using this? But need to check how much it is efficient.

So the suggestion given by Jon is one of the option but it involves more
caching and we may need to go for a persistant caching also if the size goes
increasing.

Thanks to all for providing your suggestions.  

Regards
Ram
> -----Original Message-----
> From: Jonathan Hsieh [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, August 29, 2012 11:16 PM
> To: [EMAIL PROTECTED]
> Subject: Re: A general question on maxVersion handling when we have
> Secondary index tables
>
> Let me rephrase to make sure I'm on the same page for the ram's
> question:
>
> We do three inserts on row 1 at different times to the same column
> (which
> is being indexed in a secondary table)  (Are we assuming only a 1-to-1
> secondary->primary mapping?)
>
> t1< t2 <t3
> put ("row1", "cf:c", "val1", t1)
> put ("row1", "cf:c", "val2", t2)
> put ("row1", "cf:c", "val3", t3)
>
> What happens is in the primary table we have:
>
> row1 / cf:c = val1 @ t1
> row1 / cf:c = val3 @ t2
> row1 / cf:c = val3 @ t3
>
> I'm assuming that these writes happen to a secondary table like this:
> put ("val1", "r", "row1", t1)
> put ("val2", "r", "row1", t2)
> put ("val3", "r", "row1", t3)
>
> an in the secondary table we have:
>
> val1 / r = row1 @ t1
> val2 / r = row1 @ t2
> val3 / r = row1 @ t3
>
> The core question is how and when can we efficiently and correctly get
> rid
> of the now invalid val1, val2 rows in the index table.
>
> Let's look at some of the strawmen:
> 1) periodic scan of secondary table that add delete markers for invalid
> entries (removed on major compact)
> 2) lazily delete marker on reads that are invalid (we are @t4, attempt
> to
> read via "val2" in 2ndary index, see primary value is invalid so do a
> checkAndDelete val2 from 2ndary).  would get removed on major compact.
> 3) delete on update.  This means we need to know if we are modifying a
> value and thus incurs a at least an extra read per write.
>
> Ram, does this seem like the right question and potential options to
> consider?
>
> Jon.
>
> On Wed, Aug 29, 2012 at 8:12 AM, Ramkrishna.S.Vasudevan <
> [EMAIL PROTECTED]> wrote:
>
> > When we have many to one mapping between main and secondary index
> table may
> > be we will end up in hitting many RS. If there is one to one mapping
> may be
> > that is not a problem.
> >
> > Basically my intention of this discussion was mainly to discuss on
> the
> > version maintenance on any type of secondary index particularly to
> remove
> > the stale data in the index table that would have expired.
> >
> > Regards
> > Ram
> >
> >
> > > -----Original Message-----