Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> KeyValue size in bytes compared to store files size


Copy link to this message
-
KeyValue size in bytes compared to store files size
Hi all,
I'm trying to measure the size (in bytes) of the data I'm about to load
into HBase.
I'm using bulk load with PutSortReducer.
All bulk load data is loaded into new regions and not added to existing
ones.

In order to count the size of all KeyValues in the Put object I iterate
over the Put's familyMap.values() and sum the KeyValue lengths.
After loading the data, I check the region size by summing the
RegionLoad.getStorefileSizeMB().
Counting the Put objects size predicted ~500MB per region but in practice I
got ~32MB per region.
the table uses GZ compression but this cannot be the cause of such a
difference.

Is counting the Put's KeyValues the correct way to count a row size ? Is it
comparable to the store files size ?

Thanks,
Amit.