Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - keyvalue cache


Copy link to this message
-
Re: keyvalue cache
Matt Corgan 2012-04-04, 22:28
A client-side memcached setup can stay pretty consistent if you send all of
your puts and deletes through it before sending them to hbase, but yeah, I
guess you lose strict consistency under heavy read/write from multiple
simultaneous clients.  But, like Andy is saying, if you route the requests
through the regionserver and it talks to memcached/hazelcast, couldn't that
be fully consistent?
On Wed, Apr 4, 2012 at 3:09 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote:

> I thought about trying this out once with a coprocessor, hooking the Gets,
> with an embedded Hazelcast. That would just be a proof of concept. The idea
> is to scale the KV cache independent of regionserver limits (maybe we're
> only giving 1 GB per RS to the value cache and a 10 GB region is hot) and
> the next step could be modifying the client to spread read load over
> replicas (HBASE-2357). This doesn't consider scans either.
>
>
> Best regards,
>
>     - Andy
>
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
>
> >________________________________
> > From: Matt Corgan <[EMAIL PROTECTED]>
> >To: [EMAIL PROTECTED]
> >Sent: Wednesday, April 4, 2012 2:46 PM
> >Subject: Re: keyvalue cache
> >
> >It could act like a HashSet of KeyValues keyed on the
> >rowKey+family+qualifier but not including the timestamp.  As writes come
> in
> >it would evict or overwrite previous versions (read-through vs
> >write-through).  It would only service point queries where the
> >row+fam+qualifier are specified, returning the latest version.  Wouldn't
> be
> >able to do a typical rowKey-only Get (scan behind the scenes) because it
> >wouldn't know if it contained all the cells in the row, but if you could
> >specify all your row's qualifiers up-front it could work.
> >
> >
> >On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov
> ><[EMAIL PROTECTED]>wrote:
> >
> >> 1. 2KB can be too large for some applications. For example, some of our
> >> k-v sizes < 100 bytes combined.
> >> 2. These tables (from 1.) do not benefit from block cache at all (we did
> >> not try 100 B block size yet :)
> >> 3. And Matt is absolutely right: small block size is expensive
> >>
> >> How about doing point queries on K-V cache and  bypass K-V cache on all
> >> Scans (when someone really need this)?
> >> Implement K-V cache as a coprocessor application?
> >>
> >> Invalidation of K-V entry is not necessary if all upserts operations go
> >> through K-V cache firstly if it sits in front of MemStore.
> >> There will be no "stale or invalid" data situation in this case.
> Correct?
> >> No need for data to be sorted and no need for data to be merged
> >> into a scan (we do not use K-V cache for Scans)
> >>
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: [EMAIL PROTECTED]
> >>
> >> ________________________________________
> >> From: Matt Corgan [[EMAIL PROTECTED]]
> >> Sent: Wednesday, April 04, 2012 11:40 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: keyvalue cache
> >>
> >> I guess the benefit of the KV cache is that you are not holding entire
> 64K
> >> blocks in memory when you only care about 200 bytes of them.  Would an
> >> alternative be to set a small block size (2KB or less)?
> >>
> >> The problems with small block sizes would be expensive block cache
> >> management overhead and inefficient scanning IO due to lack of
> read-ahead.
> >>  Maybe improving the cache management and read-ahead would be more
> general
> >> improvements that don't add as much complexity?
> >>
> >> I'm having a hard time envisioning how you would do invalidations on
> the KV
> >> cache and how you would merge its entries into a scan, etc.  Would it
> >> basically be a memstore in front of the memstore where KVs get
> individually
> >> invalidated instead of bulk-flushed?  Would it be sorted or hashed?
> >>
> >> Matt
> >>
> >> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote: