Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Adjusting column value size.

Copy link to this message
Adjusting column value size.

I have a question regarding the performance and column value size.
I need to store per row several million integers. ("Several million" is
important here)
I was wondering which method would be more beneficial performance wise.

1) Store each integer to a single column so that when a row is called,
several million columns will also be called. And the user would map each
column values to some kind of container (ex: vector, arrayList)
2) Store, for example, a thousand integers into a single column (by
concatenating them) so that when a row is called, only several thousand
columns will be called along. The user would have to split the column value
into 4 bytes and map the split integer to some kind of container (ex:
vector, arrayList)

I am curious which approach would be better. 1) would call several millions
of columns but no additional process is needed. 2) would call only several
thousands of columns but additional process is needed.
Any advice would be appreciated.