Michael Ellery 2013-02-07, 23:47
Ted Yu 2013-02-08, 00:34
Michael Ellery 2013-02-08, 01:02
Ted Yu 2013-02-08, 01:09
Michael Ellery 2013-02-08, 04:34
-Re: column count guidelines
Marcos Ortiz 2013-02-08, 05:38
My recommendation is to keep updated with the last HBase release, and
wait for 0.96, which it
has a lot of improvements almost in every area. I talked about this in a
I think in your use-case, Coprocessors can be very helpful, although in
Lars's "HBase: The Definitive Guide" book,
he explained in Chapter 4 how to use Counters and Coprocessors. You
should read it.
A great introduction to Coprocessors was posted in HBase's blog,  and
a great example of HBase performance tuning, including Coprocessors's
posted by Hari Kumar from Ericsson Research on its Data and Knowledge
On 02/07/2013 11:34 PM, Michael Ellery wrote:
> thanks for reminding me of the HBASE version in CDH4 - that's something we'll definitely take into consideration.
> On Feb 7, 2013, at 5:09 PM, Ted Yu wrote:
>> Thanks Michael for this information.
>> FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the two
>> features I cited below.
>> On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery <[EMAIL PROTECTED]> wrote:
>>> There is only one CF in this schema.
>>> Yes, we are looking at upgrading to CDH4, but it is not trivial since we
>>> cannot have cluster downtime. Our current upgrade plans involves additional
>>> hardware with side-by side clusters until everything is exported/imported.
>>> On Feb 7, 2013, at 4:34 PM, Ted Yu wrote:
>>>> How many column families are involved ?
>>>> Have you considered upgrading to 0.94.4 where you would be able to
>>>> from lazy seek, Data Block Encoding, etc ?
>>>> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[EMAIL PROTECTED]>
>>>>> I'm looking for some advice about per row CQ (column qualifier) count
>>>>> guidelines. Our current schema design means we have a HIGHLY variable CQ
>>>>> count per row -- some rows have one or two CQs and some rows have
>>>>> of 1 million. Each CQ is on the order of 100 bytes (for round numbers)
>>>>> the cell values are null. We see highly variable and too often
>>>>> unacceptable read performance using this schema. I don't know for a
>>>>> that the CQ count variability is the source of our problems, but I am
>>>>> I'm curious about others' experience with CQ counts per row -- are there
>>>>> some best practices/guidelines about how to optimally size the number of
>>>>> CQs per row. The other obvious solution will involve breaking this data
>>>>> into finer grained rows, which means shifting from GETs to SCANs - are
>>>>> there performance trade-offs in such a change?
>>>>> We are currently using CDH3u4, if that is relevant. All of our loading
>>>>> done via HFILE loading (bulk), so we have not had to tune write
>>>>> beyond using bulk loads. Any advice appreciated, including what metrics
>>>>> should be looking at to further diagnose our read performance
>>>>> Mike Ellery
Marcos Ortiz Valmaseda,
Product Manager && Data Scientist at UCI
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>
Dave Wang 2013-02-08, 16:58
Marcos Ortiz Valmaseda 2013-02-08, 01:08
Asaf Mesika 2013-02-08, 16:25
Ted Yu 2013-02-08, 17:50