edward choi 2011-10-04, 05:58
Jean-Daniel Cryans 2011-10-06, 17:49
-Re: Adjusting column value size.
edward choi 2011-10-07, 14:24
Yes, I need all of those ints at the same time. And no, there is no
I have decided to pack 1024 ints into one cell so that each cell would be of
I am already using LZO on my tables.
I'll do some experiments once I finish implementing both approach.
I'll add a thread about the results when I am done.
Thanks for the advice.
2011/10/7 Jean-Daniel Cryans <[EMAIL PROTECTED]>
> (BCC'd common-user@ since this seems strictly HBase related)
> Interesting question... And you probably need all those ints at the same
> time right? No streaming? I'll assume no.
> So the second solution seems better due to the overhead of storing each
> cell. Basically, storing one int per cell you would end up storing more
> than values (size wise).
> Another thing is that if you pack enough ints together and there's some
> of repetition, you might be able to use LZO compression on that table.
> I'd love to hear about your experimentations once you've done them.
> On Mon, Oct 3, 2011 at 10:58 PM, edward choi <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I have a question regarding the performance and column value size.
> > I need to store per row several million integers. ("Several million" is
> > important here)
> > I was wondering which method would be more beneficial performance wise.
> > 1) Store each integer to a single column so that when a row is called,
> > several million columns will also be called. And the user would map each
> > column values to some kind of container (ex: vector, arrayList)
> > 2) Store, for example, a thousand integers into a single column (by
> > concatenating them) so that when a row is called, only several thousand
> > columns will be called along. The user would have to split the column
> > into 4 bytes and map the split integer to some kind of container (ex:
> > vector, arrayList)
> > I am curious which approach would be better. 1) would call several
> > of columns but no additional process is needed. 2) would call only
> > thousands of columns but additional process is needed.
> > Any advice would be appreciated.
> > Ed