Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> keyvalue cache


Copy link to this message
-
Re: keyvalue cache
Not sure about memcached or coprocessors based implementations, where you
would lose a consistent view over your data. I think one of the lucene over
hbase
implementation uses a memory cache (cant remember if it was memcache) over
hbase indexreaders and writers. You can do memcache deployments with 0 code
change to hbase, but haven't heard of any one other than those guys, no?
Has anyone
tried it?

On Wed, Apr 4, 2012 at 2:53 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> in the mean time, memcached could provide all those benefits without adding
> any complexity to hbase...
>
>
> On Wed, Apr 4, 2012 at 2:46 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:
>
> > It could act like a HashSet of KeyValues keyed on the
> > rowKey+family+qualifier but not including the timestamp.  As writes come
> in
> > it would evict or overwrite previous versions (read-through vs
> > write-through).  It would only service point queries where the
> > row+fam+qualifier are specified, returning the latest version.  Wouldn't
> be
> > able to do a typical rowKey-only Get (scan behind the scenes) because it
> > wouldn't know if it contained all the cells in the row, but if you could
> > specify all your row's qualifiers up-front it could work.
> >
> >
> > On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov <
> [EMAIL PROTECTED]
> > > wrote:
> >
> >> 1. 2KB can be too large for some applications. For example, some of our
> >> k-v sizes < 100 bytes combined.
> >> 2. These tables (from 1.) do not benefit from block cache at all (we did
> >> not try 100 B block size yet :)
> >> 3. And Matt is absolutely right: small block size is expensive
> >>
> >> How about doing point queries on K-V cache and  bypass K-V cache on all
> >> Scans (when someone really need this)?
> >> Implement K-V cache as a coprocessor application?
> >>
> >> Invalidation of K-V entry is not necessary if all upserts operations go
> >> through K-V cache firstly if it sits in front of MemStore.
> >> There will be no "stale or invalid" data situation in this case.
> Correct?
> >> No need for data to be sorted and no need for data to be merged
> >> into a scan (we do not use K-V cache for Scans)
> >>
> >>
> >> Best regards,
> >> Vladimir Rodionov
> >> Principal Platform Engineer
> >> Carrier IQ, www.carrieriq.com
> >> e-mail: [EMAIL PROTECTED]
> >>
> >> ________________________________________
> >> From: Matt Corgan [[EMAIL PROTECTED]]
> >> Sent: Wednesday, April 04, 2012 11:40 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: keyvalue cache
> >>
> >> I guess the benefit of the KV cache is that you are not holding entire
> 64K
> >> blocks in memory when you only care about 200 bytes of them.  Would an
> >> alternative be to set a small block size (2KB or less)?
> >>
> >> The problems with small block sizes would be expensive block cache
> >> management overhead and inefficient scanning IO due to lack of
> read-ahead.
> >>  Maybe improving the cache management and read-ahead would be more
> general
> >> improvements that don't add as much complexity?
> >>
> >> I'm having a hard time envisioning how you would do invalidations on the
> >> KV
> >> cache and how you would merge its entries into a scan, etc.  Would it
> >> basically be a memstore in front of the memstore where KVs get
> >> individually
> >> invalidated instead of bulk-flushed?  Would it be sorted or hashed?
> >>
> >> Matt
> >>
> >> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
> >>
> >> > As you said, caching the entire row does not make much sense, given
> that
> >> > the families are by contract the access boundaries. But caching column
> >> > families might be a good trade of for dealing with the per-item
> >> overhead.
> >> >
> >> > Also agreed on cache being configurable at the table or better cf
> >> level. I
> >> > think we can do something like enable_block_cache = true,
> >> > enable_kv_cache=false, per column family.
> >> >
> >> > Enis
> >> >
> >> > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB