Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - KeyValue size in bytes compared to store files size

Copy link to this message
KeyValue size in bytes compared to store files size
Amit Sela 2014-01-15, 13:44
Hi all,
I'm trying to measure the size (in bytes) of the data I'm about to load
into HBase.
I'm using bulk load with PutSortReducer.
All bulk load data is loaded into new regions and not added to existing

In order to count the size of all KeyValues in the Put object I iterate
over the Put's familyMap.values() and sum the KeyValue lengths.
After loading the data, I check the region size by summing the
Counting the Put objects size predicted ~500MB per region but in practice I
got ~32MB per region.
the table uses GZ compression but this cannot be the cause of such a

Is counting the Put's KeyValues the correct way to count a row size ? Is it
comparable to the store files size ?