Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - keyvalue cache


Copy link to this message
-
Re: keyvalue cache
Matt Corgan 2012-04-04, 21:53
in the mean time, memcached could provide all those benefits without adding
any complexity to hbase...
On Wed, Apr 4, 2012 at 2:46 PM, Matt Corgan <[EMAIL PROTECTED]> wrote:

> It could act like a HashSet of KeyValues keyed on the
> rowKey+family+qualifier but not including the timestamp.  As writes come in
> it would evict or overwrite previous versions (read-through vs
> write-through).  It would only service point queries where the
> row+fam+qualifier are specified, returning the latest version.  Wouldn't be
> able to do a typical rowKey-only Get (scan behind the scenes) because it
> wouldn't know if it contained all the cells in the row, but if you could
> specify all your row's qualifiers up-front it could work.
>
>
> On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov <[EMAIL PROTECTED]
> > wrote:
>
>> 1. 2KB can be too large for some applications. For example, some of our
>> k-v sizes < 100 bytes combined.
>> 2. These tables (from 1.) do not benefit from block cache at all (we did
>> not try 100 B block size yet :)
>> 3. And Matt is absolutely right: small block size is expensive
>>
>> How about doing point queries on K-V cache and  bypass K-V cache on all
>> Scans (when someone really need this)?
>> Implement K-V cache as a coprocessor application?
>>
>> Invalidation of K-V entry is not necessary if all upserts operations go
>> through K-V cache firstly if it sits in front of MemStore.
>> There will be no "stale or invalid" data situation in this case. Correct?
>> No need for data to be sorted and no need for data to be merged
>> into a scan (we do not use K-V cache for Scans)
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: [EMAIL PROTECTED]
>>
>> ________________________________________
>> From: Matt Corgan [[EMAIL PROTECTED]]
>> Sent: Wednesday, April 04, 2012 11:40 AM
>> To: [EMAIL PROTECTED]
>> Subject: Re: keyvalue cache
>>
>> I guess the benefit of the KV cache is that you are not holding entire 64K
>> blocks in memory when you only care about 200 bytes of them.  Would an
>> alternative be to set a small block size (2KB or less)?
>>
>> The problems with small block sizes would be expensive block cache
>> management overhead and inefficient scanning IO due to lack of read-ahead.
>>  Maybe improving the cache management and read-ahead would be more general
>> improvements that don't add as much complexity?
>>
>> I'm having a hard time envisioning how you would do invalidations on the
>> KV
>> cache and how you would merge its entries into a scan, etc.  Would it
>> basically be a memstore in front of the memstore where KVs get
>> individually
>> invalidated instead of bulk-flushed?  Would it be sorted or hashed?
>>
>> Matt
>>
>> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <[EMAIL PROTECTED]> wrote:
>>
>> > As you said, caching the entire row does not make much sense, given that
>> > the families are by contract the access boundaries. But caching column
>> > families might be a good trade of for dealing with the per-item
>> overhead.
>> >
>> > Also agreed on cache being configurable at the table or better cf
>> level. I
>> > think we can do something like enable_block_cache = true,
>> > enable_kv_cache=false, per column family.
>> >
>> > Enis
>> >
>> > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov
>> > <[EMAIL PROTECTED]>wrote:
>> >
>> > > Usually make sense for tables with random mostly access (point
>> queries)
>> > > For short-long scans block cache is preferable.
>> > > Cassandra has it (Row cache) but as since they cache the whole row
>> (which
>> > > can be very large) in many cases
>> > > it has sub par performance. Make sense to make caching configurable:
>> > table
>> > > can use key-value cache and do not use block cache
>> > > and vice verse.
>> > >
>> > > Best regards,
>> > > Vladimir Rodionov
>> > > Principal Platform Engineer
>> > > Carrier IQ, www.carrieriq.com
>> > > e-mail: [EMAIL PROTECTED]
>> > >