Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - MemStore and prefix encoding


Copy link to this message
-
Re: MemStore and prefix encoding
Enis Söztutar 2012-08-28, 19:28
I would still caution relying on the sorting order between values of the
same cf, qualifier and timestamp. If for example, there is a Delete, it
will eclipse subsequent Puts given the same timestamp, even though Put
happened after Delete.

Enis

On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown <[EMAIL PROTECTED]> wrote:

> Lars,
>
> I have been relying on the expected behavior (if I write another cell
> with the same {key, family, qualifier, version} it won't return the
> previous one) so you're answer was confusing to me. I did more
> research and I found that the HBase guide specifies that behavior (see
> section 5.8.1 of http://hbase.apache.org/book.html).
>
> Have I misunderstood something? Can I rely on behavior that is
> specified in the guide?
>
> Thanks again!
>
> --Tom
>
> On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[EMAIL PROTECTED]> wrote:
> > Thanks for the info lars!
> >
> > In the potential use case I have for writing at the same timestamp,
> > the values would always be the same anyways so I should be good.
> >
> > On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >> I checked the code to be sure...
> >>
> >>
> >> In ScanWildcardColumnTracker we have this:
> >>
> >>       if (sameAsPreviousTSAndType(timestamp, type)) {
> >>         return ScanQueryMatcher.MatchCode.SKIP;
> >>       }
> >>
> >>
> >> And in ExplicitColumnTracker there is this:
> >>
> >>         if (sameAsPreviousTS(timestamp)) {
> >>           //If duplicate, skip this Key
> >>           return ScanQueryMatcher.MatchCode.SKIP;
> >>         }
> >>
> >>
> >> I.e. the first KV is kept and the subsequent ones (with the same TS)
> are skipped.
> >>
> >> My point remains, though: Do not rely on this.
> >> (Though it will probably stay the way it is, because that is the most
> efficient way to handle this in forward only scanners.)
> >>
> >> -- Lars
> >>
> >>
> >>
> >> ________________________________
> >>  From: Tom Brown <[EMAIL PROTECTED]>
> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <
> [EMAIL PROTECTED]>
> >> Sent: Saturday, August 25, 2012 4:54 PM
> >> Subject: Re: MemStore and prefix encoding
> >>
> >>
> >> I thought when multiple values with the same key, family, qualifier and
> timestamps were written, the one that was written latest (as determined by
> position in the store) would be read. Is that not the case?
> >>
> >> --Tom
> >>
> >> On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> >>> The prefix encoding applies to blocks in the HFiles and in the block
> cache, but not to the memstore.
> >>>
> >>>
> >>> #1 Yes. Each column family is its own store. All stores are flushed
> together, so have many add overhead (especially if a few tend to hold a lot
> of data, but the others don't, leading to very many small store files that
> need to be compacted).
> >>> #2 There is only one key with the same key, column family, qualifier,
> and timestamp (if you write multiple with the same timestamp it is
> undefined which one you'll get back when you read the next time). So that
> does not make sense. Writes with the same key, column family, qualifier
> (each with a different timestamp) count towards the version limit.
> >>>
> >>> -- Lars
> >>>
> >>>
> >>> ----- Original Message -----
> >>> From: Eric Czech <[EMAIL PROTECTED]>
> >>> To: user <[EMAIL PROTECTED]>
> >>> Cc:
> >>> Sent: Saturday, August 25, 2012 2:44 PM
> >>> Subject: MemStore and prefix encoding
> >>>
> >>> Hi everyone,
> >>>
> >>> Does prefix encoding apply to rows in MemStores or does it only apply
> >>> to rows on disk in HFiles?  I'm trying to decide if I should still
> >>> favor larger values in order to not repeat keys, column families, and
> >>> qualifiers more than necessary and while prefix encoding seems to
> >>> negate that concern for storage on disk, I'm not sure if it's still
> >>> applicable to in-memory storage.
> >>>
> >>> Also, I had two other quick (unrelated) questions and I assume it'd be