Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> larger HFile block size for very wide row?


Copy link to this message
-
Re: larger HFile block size for very wide row?
Sorry, 1000 columns, each 2K, so each row is 2M. I guess HBase will keep a
single KV (i.e., a column rather than a row) in a block, so a row will
span multiple blocks?

My scan pattern is: I will do range scan, find the matching row keys, and
fetch the whole row for each row that matches my criteria.

Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

From:   lars hofhansl <[EMAIL PROTECTED]>
To:     "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>,
Date:   01/29/2014 03:49 PM
Subject:        Re: larger HFile block size for very wide row?

You 1000 columns? Not 1000k = 1m column, I assume.
So you'll have 2MB KVs. That's a bit on the large side.

HBase will "grow" the block to fit the KV into it. It means you have
basically one block per KV.
I guess you address these rows via point gets (GET), and do not typically
scan through them, right?

Do you see any performance issues?

-- Lars

________________________________
 From: Wei Tan <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Wednesday, January 29, 2014 12:35 PM
Subject: larger HFile block size for very wide row?
 

Hi, I have a HBase table where each row has ~1000k columns, ~2K each. My
table scan pattern is to use a row key filter but I need to fetch the
whole row (~1000 k) columns back.

Shall I set HFile block size to be larger than the default 64K?
Thanks,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB