Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - column count guidelines

Copy link to this message
Re: column count guidelines
Asaf Mesika 2013-02-08, 16:25
Can you elaborate more on that features? I thought 4 was just for bug fixes.

Sent from my iPhone

On 8 בפבר 2013, at 02:34, Ted Yu <[EMAIL PROTECTED]> wrote:

How many column families are involved ?

Have you considered upgrading to 0.94.4 where you would be able to benefit
from lazy seek, Data Block Encoding, etc ?


On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[EMAIL PROTECTED]> wrote:

I'm looking for some advice about per row CQ (column qualifier) count

guidelines. Our current schema design means we have a HIGHLY variable CQ

count per row -- some rows have one or two CQs and some rows have upwards

of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and

the cell values are null.  We see highly variable and too often

unacceptable read performance using this schema.  I don't know for a fact

that the CQ count variability is the source of our problems, but I am

I'm curious about others' experience with CQ counts per row -- are there

some best practices/guidelines about how to optimally size the number of

CQs per row. The other obvious solution will involve breaking this data

into finer grained rows, which means shifting from GETs to SCANs - are

there performance trade-offs in such a change?
We are currently using CDH3u4, if that is relevant. All of our loading is

done via HFILE loading (bulk), so we have not had to tune write performance

beyond using bulk loads. Any advice appreciated, including what metrics we

should be looking at to further diagnose our read performance challenges.

Mike Ellery