|
|
-
MemStore and prefix encoding
Eric Czech 2012-08-25, 21:44
Hi everyone,
Does prefix encoding apply to rows in MemStores or does it only apply to rows on disk in HFiles? I'm trying to decide if I should still favor larger values in order to not repeat keys, column families, and qualifiers more than necessary and while prefix encoding seems to negate that concern for storage on disk, I'm not sure if it's still applicable to in-memory storage.
Also, I had two other quick (unrelated) questions and I assume it'd be less annoying if I put them all in one email:
1. Do column families defined for a table introduce any overhead for rows that don't put any values in them? I don't think that's the case but I wanted to be sure.
2. Do writes with the same key, column family, qualifier, and timestamp count towards the version limit?
Thanks for the help!
-
Re: MemStore and prefix encoding
lars hofhansl 2012-08-25, 21:57
The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit.
-- Lars ----- Original Message ----- From: Eric Czech <[EMAIL PROTECTED]> To: user <[EMAIL PROTECTED]> Cc: Sent: Saturday, August 25, 2012 2:44 PM Subject: MemStore and prefix encoding
Hi everyone,
Does prefix encoding apply to rows in MemStores or does it only apply to rows on disk in HFiles? I'm trying to decide if I should still favor larger values in order to not repeat keys, column families, and qualifiers more than necessary and while prefix encoding seems to negate that concern for storage on disk, I'm not sure if it's still applicable to in-memory storage.
Also, I had two other quick (unrelated) questions and I assume it'd be less annoying if I put them all in one email:
1. Do column families defined for a table introduce any overhead for rows that don't put any values in them? I don't think that's the case but I wanted to be sure.
2. Do writes with the same key, column family, qualifier, and timestamp count towards the version limit?
Thanks for the help!
-
Re: MemStore and prefix encoding
Tom Brown 2012-08-25, 23:54
I thought when multiple values with the same key, family, qualifier and timestamps were written, the one that was written latest (as determined by position in the store) would be read. Is that not the case?
--Tom
On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> wrote: > The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. > > > #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). > #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit. > > -- Lars > > > ----- Original Message ----- > From: Eric Czech <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Cc: > Sent: Saturday, August 25, 2012 2:44 PM > Subject: MemStore and prefix encoding > > Hi everyone, > > Does prefix encoding apply to rows in MemStores or does it only apply > to rows on disk in HFiles? I'm trying to decide if I should still > favor larger values in order to not repeat keys, column families, and > qualifiers more than necessary and while prefix encoding seems to > negate that concern for storage on disk, I'm not sure if it's still > applicable to in-memory storage. > > Also, I had two other quick (unrelated) questions and I assume it'd be > less annoying if I put them all in one email: > > 1. Do column families defined for a table introduce any overhead for > rows that don't put any values in them? I don't think that's the case > but I wanted to be sure. > > 2. Do writes with the same key, column family, qualifier, and > timestamp count towards the version limit? > > Thanks for the help! > >
-
Re: MemStore and prefix encoding
lars hofhansl 2012-08-26, 01:12
I checked the code to be sure... In ScanWildcardColumnTracker we have this:
if (sameAsPreviousTSAndType(timestamp, type)) { return ScanQueryMatcher.MatchCode.SKIP; } And in ExplicitColumnTracker there is this:
if (sameAsPreviousTS(timestamp)) { //If duplicate, skip this Key return ScanQueryMatcher.MatchCode.SKIP; } I.e. the first KV is kept and the subsequent ones (with the same TS) are skipped.
My point remains, though: Do not rely on this. (Though it will probably stay the way it is, because that is the most efficient way to handle this in forward only scanners.)
-- Lars
________________________________ From: Tom Brown <[EMAIL PROTECTED]> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> Sent: Saturday, August 25, 2012 4:54 PM Subject: Re: MemStore and prefix encoding
I thought when multiple values with the same key, family, qualifier and timestamps were written, the one that was written latest (as determined by position in the store) would be read. Is that not the case?
--Tom
On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> wrote: > The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. > > > #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). > #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit. > > -- Lars > > > ----- Original Message ----- > From: Eric Czech <[EMAIL PROTECTED]> > To: user <[EMAIL PROTECTED]> > Cc: > Sent: Saturday, August 25, 2012 2:44 PM > Subject: MemStore and prefix encoding > > Hi everyone, > > Does prefix encoding apply to rows in MemStores or does it only apply > to rows on disk in HFiles? I'm trying to decide if I should still > favor larger values in order to not repeat keys, column families, and > qualifiers more than necessary and while prefix encoding seems to > negate that concern for storage on disk, I'm not sure if it's still > applicable to in-memory storage. > > Also, I had two other quick (unrelated) questions and I assume it'd be > less annoying if I put them all in one email: > > 1. Do column families defined for a table introduce any overhead for > rows that don't put any values in them? I don't think that's the case > but I wanted to be sure. > > 2. Do writes with the same key, column family, qualifier, and > timestamp count towards the version limit? > > Thanks for the help! > >
-
Re: MemStore and prefix encoding
Eric Czech 2012-08-26, 12:43
Thanks for the info lars!
In the potential use case I have for writing at the same timestamp, the values would always be the same anyways so I should be good.
On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > I checked the code to be sure... > > > In ScanWildcardColumnTracker we have this: > > if (sameAsPreviousTSAndType(timestamp, type)) { > return ScanQueryMatcher.MatchCode.SKIP; > } > > > And in ExplicitColumnTracker there is this: > > if (sameAsPreviousTS(timestamp)) { > //If duplicate, skip this Key > return ScanQueryMatcher.MatchCode.SKIP; > } > > > I.e. the first KV is kept and the subsequent ones (with the same TS) are skipped. > > My point remains, though: Do not rely on this. > (Though it will probably stay the way it is, because that is the most efficient way to handle this in forward only scanners.) > > -- Lars > > > > ________________________________ > From: Tom Brown <[EMAIL PROTECTED]> > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> > Sent: Saturday, August 25, 2012 4:54 PM > Subject: Re: MemStore and prefix encoding > > > I thought when multiple values with the same key, family, qualifier and timestamps were written, the one that was written latest (as determined by position in the store) would be read. Is that not the case? > > --Tom > > On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> wrote: >> The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. >> >> >> #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). >> #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Eric Czech <[EMAIL PROTECTED]> >> To: user <[EMAIL PROTECTED]> >> Cc: >> Sent: Saturday, August 25, 2012 2:44 PM >> Subject: MemStore and prefix encoding >> >> Hi everyone, >> >> Does prefix encoding apply to rows in MemStores or does it only apply >> to rows on disk in HFiles? I'm trying to decide if I should still >> favor larger values in order to not repeat keys, column families, and >> qualifiers more than necessary and while prefix encoding seems to >> negate that concern for storage on disk, I'm not sure if it's still >> applicable to in-memory storage. >> >> Also, I had two other quick (unrelated) questions and I assume it'd be >> less annoying if I put them all in one email: >> >> 1. Do column families defined for a table introduce any overhead for >> rows that don't put any values in them? I don't think that's the case >> but I wanted to be sure. >> >> 2. Do writes with the same key, column family, qualifier, and >> timestamp count towards the version limit? >> >> Thanks for the help! >> >>
-
Re: MemStore and prefix encoding
Tom Brown 2012-08-27, 16:20
Lars, I have been relying on the expected behavior (if I write another cell with the same {key, family, qualifier, version} it won't return the previous one) so you're answer was confusing to me. I did more research and I found that the HBase guide specifies that behavior (see section 5.8.1 of http://hbase.apache.org/book.html). Have I misunderstood something? Can I rely on behavior that is specified in the guide? Thanks again! --Tom On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[EMAIL PROTECTED]> wrote: > Thanks for the info lars! > > In the potential use case I have for writing at the same timestamp, > the values would always be the same anyways so I should be good. > > On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: >> I checked the code to be sure... >> >> >> In ScanWildcardColumnTracker we have this: >> >> if (sameAsPreviousTSAndType(timestamp, type)) { >> return ScanQueryMatcher.MatchCode.SKIP; >> } >> >> >> And in ExplicitColumnTracker there is this: >> >> if (sameAsPreviousTS(timestamp)) { >> //If duplicate, skip this Key >> return ScanQueryMatcher.MatchCode.SKIP; >> } >> >> >> I.e. the first KV is kept and the subsequent ones (with the same TS) are skipped. >> >> My point remains, though: Do not rely on this. >> (Though it will probably stay the way it is, because that is the most efficient way to handle this in forward only scanners.) >> >> -- Lars >> >> >> >> ________________________________ >> From: Tom Brown <[EMAIL PROTECTED]> >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> >> Sent: Saturday, August 25, 2012 4:54 PM >> Subject: Re: MemStore and prefix encoding >> >> >> I thought when multiple values with the same key, family, qualifier and timestamps were written, the one that was written latest (as determined by position in the store) would be read. Is that not the case? >> >> --Tom >> >> On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> wrote: >>> The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. >>> >>> >>> #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). >>> #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit. >>> >>> -- Lars >>> >>> >>> ----- Original Message ----- >>> From: Eric Czech <[EMAIL PROTECTED]> >>> To: user <[EMAIL PROTECTED]> >>> Cc: >>> Sent: Saturday, August 25, 2012 2:44 PM >>> Subject: MemStore and prefix encoding >>> >>> Hi everyone, >>> >>> Does prefix encoding apply to rows in MemStores or does it only apply >>> to rows on disk in HFiles? I'm trying to decide if I should still >>> favor larger values in order to not repeat keys, column families, and >>> qualifiers more than necessary and while prefix encoding seems to >>> negate that concern for storage on disk, I'm not sure if it's still >>> applicable to in-memory storage. >>> >>> Also, I had two other quick (unrelated) questions and I assume it'd be >>> less annoying if I put them all in one email: >>> >>> 1. Do column families defined for a table introduce any overhead for >>> rows that don't put any values in them? I don't think that's the case >>> but I wanted to be sure. >>> >>> 2. Do writes with the same key, column family, qualifier, and >>> timestamp count towards the version limit? >>> >>> Thanks for the help! >>> >>>
-
Re: MemStore and prefix encoding
Stack 2012-08-27, 20:30
On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown <[EMAIL PROTECTED]> wrote: > Lars, > > I have been relying on the expected behavior (if I write another cell > with the same {key, family, qualifier, version} it won't return the > previous one) so you're answer was confusing to me. I did more > research and I found that the HBase guide specifies that behavior (see > section 5.8.1 of http://hbase.apache.org/book.html). > > Have I misunderstood something? Can I rely on behavior that is > specified in the guide? > If the code AND the refguide say the same thing, apart from that being a minor miracle, I'd say its unlikely the behavior will change, not w/o really good reason (We don't etch anything in stone around these parts). St.Ack
-
Re: MemStore and prefix encoding
Lars H 2012-08-27, 22:52
Oops. The KVs are sorties in reverse chronological order. So I was wrong. It'll return newest version. Sorry about that confusion. The book is correct. -- Lars Tom Brown <[EMAIL PROTECTED]> schrieb: >Lars, > >I have been relying on the expected behavior (if I write another cell >with the same {key, family, qualifier, version} it won't return the >previous one) so you're answer was confusing to me. I did more >research and I found that the HBase guide specifies that behavior (see >section 5.8.1 of http://hbase.apache.org/book.html). > >Have I misunderstood something? Can I rely on behavior that is >specified in the guide? > >Thanks again! > >--Tom > >On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[EMAIL PROTECTED]> wrote: >> Thanks for the info lars! >> >> In the potential use case I have for writing at the same timestamp, >> the values would always be the same anyways so I should be good. >> >> On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: >>> I checked the code to be sure... >>> >>> >>> In ScanWildcardColumnTracker we have this: >>> >>> if (sameAsPreviousTSAndType(timestamp, type)) { >>> return ScanQueryMatcher.MatchCode.SKIP; >>> } >>> >>> >>> And in ExplicitColumnTracker there is this: >>> >>> if (sameAsPreviousTS(timestamp)) { >>> //If duplicate, skip this Key >>> return ScanQueryMatcher.MatchCode.SKIP; >>> } >>> >>> >>> I.e. the first KV is kept and the subsequent ones (with the same TS) are skipped. >>> >>> My point remains, though: Do not rely on this. >>> (Though it will probably stay the way it is, because that is the most efficient way to handle this in forward only scanners.) >>> >>> -- Lars >>> >>> >>> >>> ________________________________ >>> From: Tom Brown <[EMAIL PROTECTED]> >>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> >>> Sent: Saturday, August 25, 2012 4:54 PM >>> Subject: Re: MemStore and prefix encoding >>> >>> >>> I thought when multiple values with the same key, family, qualifier and timestamps were written, the one that was written latest (as determined by position in the store) would be read. Is that not the case? >>> >>> --Tom >>> >>> On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> wrote: >>>> The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. >>>> >>>> >>>> #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). >>>> #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit. >>>> >>>> -- Lars >>>> >>>> >>>> ----- Original Message ----- >>>> From: Eric Czech <[EMAIL PROTECTED]> >>>> To: user <[EMAIL PROTECTED]> >>>> Cc: >>>> Sent: Saturday, August 25, 2012 2:44 PM >>>> Subject: MemStore and prefix encoding >>>> >>>> Hi everyone, >>>> >>>> Does prefix encoding apply to rows in MemStores or does it only apply >>>> to rows on disk in HFiles? I'm trying to decide if I should still >>>> favor larger values in order to not repeat keys, column families, and >>>> qualifiers more than necessary and while prefix encoding seems to >>>> negate that concern for storage on disk, I'm not sure if it's still >>>> applicable to in-memory storage. >>>> >>>> Also, I had two other quick (unrelated) questions and I assume it'd be >>>> less annoying if I put them all in one email: >>>> >>>> 1. Do column families defined for a table introduce any overhead for >>>> rows that don't put any values in them? I don't think that's the case >>>
-
Re: MemStore and prefix encoding
lars hofhansl 2012-08-27, 23:40
Also confirmed via experiment (in the memstore, store files, mixed store files, mixed store files and memstore). -- Lars ----- Original Message ----- From: Lars H <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: Sent: Monday, August 27, 2012 3:52 PM Subject: Re: MemStore and prefix encoding Oops. The KVs are sorties in reverse chronological order. So I was wrong. It'll return newest version. Sorry about that confusion. The book is correct. -- Lars Tom Brown <[EMAIL PROTECTED]> schrieb: >Lars, > >I have been relying on the expected behavior (if I write another cell >with the same {key, family, qualifier, version} it won't return the >previous one) so you're answer was confusing to me. I did more >research and I found that the HBase guide specifies that behavior (see >section 5.8.1 of http://hbase.apache.org/book.html). > >Have I misunderstood something? Can I rely on behavior that is >specified in the guide? > >Thanks again! > >--Tom > >On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[EMAIL PROTECTED]> wrote: >> Thanks for the info lars! >> >> In the potential use case I have for writing at the same timestamp, >> the values would always be the same anyways so I should be good. >> >> On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: >>> I checked the code to be sure... >>> >>> >>> In ScanWildcardColumnTracker we have this: >>> >>> if (sameAsPreviousTSAndType(timestamp, type)) { >>> return ScanQueryMatcher.MatchCode.SKIP; >>> } >>> >>> >>> And in ExplicitColumnTracker there is this: >>> >>> if (sameAsPreviousTS(timestamp)) { >>> //If duplicate, skip this Key >>> return ScanQueryMatcher.MatchCode.SKIP; >>> } >>> >>> >>> I.e. the first KV is kept and the subsequent ones (with the same TS) are skipped. >>> >>> My point remains, though: Do not rely on this. >>> (Though it will probably stay the way it is, because that is the most efficient way to handle this in forward only scanners.) >>> >>> -- Lars >>> >>> >>> >>> ________________________________ >>> From: Tom Brown <[EMAIL PROTECTED]> >>> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl <[EMAIL PROTECTED]> >>> Sent: Saturday, August 25, 2012 4:54 PM >>> Subject: Re: MemStore and prefix encoding >>> >>> >>> I thought when multiple values with the same key, family, qualifier and timestamps were written, the one that was written latest (as determined by position in the store) would be read. Is that not the case? >>> >>> --Tom >>> >>> On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> wrote: >>>> The prefix encoding applies to blocks in the HFiles and in the block cache, but not to the memstore. >>>> >>>> >>>> #1 Yes. Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). >>>> #2 There is only one key with the same key, column family, qualifier, and timestamp (if you write multiple with the same timestamp it is undefined which one you'll get back when you read the next time). So that does not make sense. Writes with the same key, column family, qualifier (each with a different timestamp) count towards the version limit. >>>> >>>> -- Lars >>>> >>>> >>>> ----- Original Message ----- >>>> From: Eric Czech <[EMAIL PROTECTED]> >>>> To: user <[EMAIL PROTECTED]> >>>> Cc: >>>> Sent: Saturday, August 25, 2012 2:44 PM >>>> Subject: MemStore and prefix encoding >>>> >>>> Hi everyone, >>>> >>>> Does prefix encoding apply to rows in MemStores or does it only apply >>>> to rows on disk in HFiles? I'm trying to decide if I should still >>>> favor larger values in order to not repeat keys, column families, and >>>> qualifiers more than necessary and while prefix encoding seems to >>>> negate that concern for storage on disk, I'm not sure if it's still >>>> applicable to in-memory storage.
-
Re: MemStore and prefix encoding
Joe Pallas 2012-08-28, 16:59
On Aug 25, 2012, at 2:57 PM, lars hofhansl wrote:
> Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted).
If there are no writes at all to a CF, will a flush still create a new store file? I’m thinking about a case where one CF is basically write-once and another CF gets frequent updates.
Thanks. joe
-
Re: MemStore and prefix encoding
Stack 2012-08-28, 17:54
On Tue, Aug 28, 2012 at 9:59 AM, Joe Pallas <[EMAIL PROTECTED]> wrote: > > On Aug 25, 2012, at 2:57 PM, lars hofhansl wrote: > >> Each column family is its own store. All stores are flushed together, so have many add overhead (especially if a few tend to hold a lot of data, but the others don't, leading to very many small store files that need to be compacted). > > If there are no writes at all to a CF, will a flush still create a new store file? I’m thinking about a case where one CF is basically write-once and another CF gets frequent updates. > I was going to say we won't write a file if no kvs, but looking at code, it looks like we could write a file w/ nothing but metadata: http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#725 I see a check up in HRegion where we look at memstore size, the aggregated size of all CFs, and if it is null we'll skip a flush but I did not see an equivalent check at the CF level (after three minutes of looking). We should add one if not one already. St.Ack
-
Re: MemStore and prefix encoding
Enis Söztutar 2012-08-28, 19:28
I would still caution relying on the sorting order between values of the same cf, qualifier and timestamp. If for example, there is a Delete, it will eclipse subsequent Puts given the same timestamp, even though Put happened after Delete. Enis On Mon, Aug 27, 2012 at 9:20 AM, Tom Brown <[EMAIL PROTECTED]> wrote: > Lars, > > I have been relying on the expected behavior (if I write another cell > with the same {key, family, qualifier, version} it won't return the > previous one) so you're answer was confusing to me. I did more > research and I found that the HBase guide specifies that behavior (see > section 5.8.1 of http://hbase.apache.org/book.html). > > Have I misunderstood something? Can I rely on behavior that is > specified in the guide? > > Thanks again! > > --Tom > > On Sun, Aug 26, 2012 at 6:43 AM, Eric Czech <[EMAIL PROTECTED]> wrote: > > Thanks for the info lars! > > > > In the potential use case I have for writing at the same timestamp, > > the values would always be the same anyways so I should be good. > > > > On Sat, Aug 25, 2012 at 9:12 PM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > >> I checked the code to be sure... > >> > >> > >> In ScanWildcardColumnTracker we have this: > >> > >> if (sameAsPreviousTSAndType(timestamp, type)) { > >> return ScanQueryMatcher.MatchCode.SKIP; > >> } > >> > >> > >> And in ExplicitColumnTracker there is this: > >> > >> if (sameAsPreviousTS(timestamp)) { > >> //If duplicate, skip this Key > >> return ScanQueryMatcher.MatchCode.SKIP; > >> } > >> > >> > >> I.e. the first KV is kept and the subsequent ones (with the same TS) > are skipped. > >> > >> My point remains, though: Do not rely on this. > >> (Though it will probably stay the way it is, because that is the most > efficient way to handle this in forward only scanners.) > >> > >> -- Lars > >> > >> > >> > >> ________________________________ > >> From: Tom Brown <[EMAIL PROTECTED]> > >> To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>; lars hofhansl < > [EMAIL PROTECTED]> > >> Sent: Saturday, August 25, 2012 4:54 PM > >> Subject: Re: MemStore and prefix encoding > >> > >> > >> I thought when multiple values with the same key, family, qualifier and > timestamps were written, the one that was written latest (as determined by > position in the store) would be read. Is that not the case? > >> > >> --Tom > >> > >> On Saturday, August 25, 2012, lars hofhansl <[EMAIL PROTECTED]> > wrote: > >>> The prefix encoding applies to blocks in the HFiles and in the block > cache, but not to the memstore. > >>> > >>> > >>> #1 Yes. Each column family is its own store. All stores are flushed > together, so have many add overhead (especially if a few tend to hold a lot > of data, but the others don't, leading to very many small store files that > need to be compacted). > >>> #2 There is only one key with the same key, column family, qualifier, > and timestamp (if you write multiple with the same timestamp it is > undefined which one you'll get back when you read the next time). So that > does not make sense. Writes with the same key, column family, qualifier > (each with a different timestamp) count towards the version limit. > >>> > >>> -- Lars > >>> > >>> > >>> ----- Original Message ----- > >>> From: Eric Czech <[EMAIL PROTECTED]> > >>> To: user <[EMAIL PROTECTED]> > >>> Cc: > >>> Sent: Saturday, August 25, 2012 2:44 PM > >>> Subject: MemStore and prefix encoding > >>> > >>> Hi everyone, > >>> > >>> Does prefix encoding apply to rows in MemStores or does it only apply > >>> to rows on disk in HFiles? I'm trying to decide if I should still > >>> favor larger values in order to not repeat keys, column families, and > >>> qualifiers more than necessary and while prefix encoding seems to > >>> negate that concern for storage on disk, I'm not sure if it's still > >>> applicable to in-memory storage. > >>> > >>> Also, I had two other quick (unrelated) questions and I assume it'd be
|
|