Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> keyvalue cache


Copy link to this message
-
Re: keyvalue cache
I think you are right that if you replicate the row MVCC semantics in the
cache, then
you can get a consistent view. I was referring to a more client side
approach.

I guess the take aways are:
 - forget about scans, and shoot for point gets, which I agree
 - per-kv cache overhead might be huge, but still worth trying it out.
 - can also be architected on top of coprocessors
 - might be complex to implement, but still some use cases would benefit a
lot.
 - row cache / family cache / kv cache

Thanks for all the input!

Enis

On Wed, Apr 4, 2012 at 3:28 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> A client-side memcached setup can stay pretty consistent if you send all of
> your puts and deletes through it before sending them to hbase, but yeah, I
> guess you lose strict consistency under heavy read/write from multiple
> simultaneous clients.  But, like Andy is saying, if you route the requests
> through the regionserver and it talks to memcached/hazelcast, couldn't that
> be fully consistent?
>
>
> On Wed, Apr 4, 2012 at 3:09 PM, Andrew Purtell <[EMAIL PROTECTED]>
> wrote:
>
> > I thought about trying this out once with a coprocessor, hooking the
> Gets,
> > with an embedded Hazelcast. That would just be a proof of concept. The
> idea
> > is to scale the KV cache independent of regionserver limits (maybe we're
> > only giving 1 GB per RS to the value cache and a 10 GB region is hot) and
> > the next step could be modifying the client to spread read load over
> > replicas (HBASE-2357). This doesn't consider scans either.
> >
> >
> > Best regards,
> >
> >     - Andy
> >
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
> >
> >
> > >________________________________
> > > From: Matt Corgan <[EMAIL PROTECTED]>
> > >To: [EMAIL PROTECTED]
> > >Sent: Wednesday, April 4, 2012 2:46 PM
> > >Subject: Re: keyvalue cache
> > >
> > >It could act like a HashSet of KeyValues keyed on the
> > >rowKey+family+qualifier but not including the timestamp.  As writes come
> > in
> > >it would evict or overwrite previous versions (read-through vs
> > >write-through).  It would only service point queries where the
> > >row+fam+qualifier are specified, returning the latest version.  Wouldn't
> > be
> > >able to do a typical rowKey-only Get (scan behind the scenes) because it
> > >wouldn't know if it contained all the cells in the row, but if you could
> > >specify all your row's qualifiers up-front it could work.
> > >
> > >
> > >On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov
> > ><[EMAIL PROTECTED]>wrote:
> > >
> > >> 1. 2KB can be too large for some applications. For example, some of
> our
> > >> k-v sizes < 100 bytes combined.
> > >> 2. These tables (from 1.) do not benefit from block cache at all (we
> did
> > >> not try 100 B block size yet :)
> > >> 3. And Matt is absolutely right: small block size is expensive
> > >>
> > >> How about doing point queries on K-V cache and  bypass K-V cache on
> all
> > >> Scans (when someone really need this)?
> > >> Implement K-V cache as a coprocessor application?
> > >>
> > >> Invalidation of K-V entry is not necessary if all upserts operations
> go
> > >> through K-V cache firstly if it sits in front of MemStore.
> > >> There will be no "stale or invalid" data situation in this case.
> > Correct?
> > >> No need for data to be sorted and no need for data to be merged
> > >> into a scan (we do not use K-V cache for Scans)
> > >>
> > >>
> > >> Best regards,
> > >> Vladimir Rodionov
> > >> Principal Platform Engineer
> > >> Carrier IQ, www.carrieriq.com
> > >> e-mail: [EMAIL PROTECTED]
> > >>
> > >> ________________________________________
> > >> From: Matt Corgan [[EMAIL PROTECTED]]
> > >> Sent: Wednesday, April 04, 2012 11:40 AM
> > >> To: [EMAIL PROTECTED]
> > >> Subject: Re: keyvalue cache
> > >>
> > >> I guess the benefit of the KV cache is that you are not holding entire
> > 64K
> > >> blocks in memory when you only care about 200 bytes of them.  Would an