Michael Ellery 2013-02-07, 23:47
Ted Yu 2013-02-08, 00:34
Michael Ellery 2013-02-08, 01:02
Ted Yu 2013-02-08, 01:09
Michael Ellery 2013-02-08, 04:34
Marcos Ortiz 2013-02-08, 05:38
Dave Wang 2013-02-08, 16:58
Marcos Ortiz Valmaseda 2013-02-08, 01:08
-Re: column count guidelines
Asaf Mesika 2013-02-08, 16:25
Can you elaborate more on that features? I thought 4 was just for bug fixes.
Sent from my iPhone
On 8 בפבר 2013, at 02:34, Ted Yu <[EMAIL PROTECTED]> wrote:
How many column families are involved ?
Have you considered upgrading to 0.94.4 where you would be able to benefit
from lazy seek, Data Block Encoding, etc ?
On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[EMAIL PROTECTED]> wrote:
I'm looking for some advice about per row CQ (column qualifier) count
guidelines. Our current schema design means we have a HIGHLY variable CQ
count per row -- some rows have one or two CQs and some rows have upwards
of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and
the cell values are null. We see highly variable and too often
unacceptable read performance using this schema. I don't know for a fact
that the CQ count variability is the source of our problems, but I am
I'm curious about others' experience with CQ counts per row -- are there
some best practices/guidelines about how to optimally size the number of
CQs per row. The other obvious solution will involve breaking this data
into finer grained rows, which means shifting from GETs to SCANs - are
there performance trade-offs in such a change?
We are currently using CDH3u4, if that is relevant. All of our loading is
done via HFILE loading (bulk), so we have not had to tune write performance
beyond using bulk loads. Any advice appreciated, including what metrics we
should be looking at to further diagnose our read performance challenges.
Ted Yu 2013-02-08, 17:50