Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> KeyValue size in bytes compared to store files size


Copy link to this message
-
Re: KeyValue size in bytes compared to store files size
There can be a lot of duplication in what ends up in HFiles but 500MB ->
32MB does seem too good to be true.

Could you try writing without GZIP or mess with the hfile reader[1] to see
what your keys look like when at rest in an HFile (and maybe save the
decompressed hfile to compare sizes?)

St.Ack
1. http://hbase.apache.org/book.html#hfile
On Wed, Jan 15, 2014 at 7:43 AM, Amit Sela <[EMAIL PROTECTED]> wrote:

> I'm talking about the store files size and the ratio between store file
> size and the byte count as counted in PutSortReducer.
>
>
> On Wed, Jan 15, 2014 at 5:35 PM, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> > See previous discussion: http://search-hadoop.com/m/85S3A1DgZHP1
> >
> >
> > On Wed, Jan 15, 2014 at 5:44 AM, Amit Sela <[EMAIL PROTECTED]> wrote:
> >
> > > Hi all,
> > > I'm trying to measure the size (in bytes) of the data I'm about to load
> > > into HBase.
> > > I'm using bulk load with PutSortReducer.
> > > All bulk load data is loaded into new regions and not added to existing
> > > ones.
> > >
> > > In order to count the size of all KeyValues in the Put object I iterate
> > > over the Put's familyMap.values() and sum the KeyValue lengths.
> > > After loading the data, I check the region size by summing the
> > > RegionLoad.getStorefileSizeMB().
> > > Counting the Put objects size predicted ~500MB per region but in
> > practice I
> > > got ~32MB per region.
> > > the table uses GZ compression but this cannot be the cause of such a
> > > difference.
> > >
> > > Is counting the Put's KeyValues the correct way to count a row size ?
> Is
> > it
> > > comparable to the store files size ?
> > >
> > > Thanks,
> > > Amit.
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB