|
|
Eric Czech 2012-08-07, 13:35
Hello everyone,
I'm trying to store many small values in indexes created via MR jobs, and I was hoping to get some advice on how to structure my rows. Essentially, I have complete control over how large the rows should be as the values are small, consistent in size, and can be grouped together in any way I'd like. My question then is, what's the ideal size for a row in Hbase, in bytes? I'm trying to determine how to group my values together into larger values, and I think having a target size to hit would make that a lot easier.
I know fewer rows is generally better to avoid the repetitive storage of keys, cfs, and qualifiers provided that those rows still suit a given application, but I'm not sure at what point the scale will tip in the other direction and I'll start to see undue memory pressure or compaction issues with rows that are too large.
Thanks in advance!
Jean-Daniel Cryans 2012-08-07, 18:26
Hi Eric,
An ideal cell size would probably be the size of a block, so 64KB including the keys. Having bigger cells would inflate the size of your blocks but then you'd be outside of the normal HBase settings.
That, and do some experiments.
J-D
On Tue, Aug 7, 2012 at 6:35 AM, Eric Czech <[EMAIL PROTECTED]> wrote: > Hello everyone, > > I'm trying to store many small values in indexes created via MR jobs, > and I was hoping to get some advice on how to structure my rows. > Essentially, I have complete control over how large the rows should be > as the values are small, consistent in size, and can be grouped > together in any way I'd like. My question then is, what's the ideal > size for a row in Hbase, in bytes? I'm trying to determine how to > group my values together into larger values, and I think having a > target size to hit would make that a lot easier. > > I know fewer rows is generally better to avoid the repetitive storage > of keys, cfs, and qualifiers provided that those rows still suit a > given application, but I'm not sure at what point the scale will tip > in the other direction and I'll start to see undue memory pressure or > compaction issues with rows that are too large. > > Thanks in advance!
Eric Czech 2012-08-09, 00:32
That's the exactly sort of target I was looking for -- thanks for the help!
I'll probably shoot for something close to 48KB so I don't exceed that block size.
On Tue, Aug 7, 2012 at 2:26 PM, Jean-Daniel Cryans <[EMAIL PROTECTED]> wrote: > Hi Eric, > > An ideal cell size would probably be the size of a block, so 64KB > including the keys. Having bigger cells would inflate the size of your > blocks but then you'd be outside of the normal HBase settings. > > That, and do some experiments. > > J-D > > On Tue, Aug 7, 2012 at 6:35 AM, Eric Czech <[EMAIL PROTECTED]> wrote: >> Hello everyone, >> >> I'm trying to store many small values in indexes created via MR jobs, >> and I was hoping to get some advice on how to structure my rows. >> Essentially, I have complete control over how large the rows should be >> as the values are small, consistent in size, and can be grouped >> together in any way I'd like. My question then is, what's the ideal >> size for a row in Hbase, in bytes? I'm trying to determine how to >> group my values together into larger values, and I think having a >> target size to hit would make that a lot easier. >> >> I know fewer rows is generally better to avoid the repetitive storage >> of keys, cfs, and qualifiers provided that those rows still suit a >> given application, but I'm not sure at what point the scale will tip >> in the other direction and I'll start to see undue memory pressure or >> compaction issues with rows that are too large. >> >> Thanks in advance!
|
|