Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> column count guidelines


+
Michael Ellery 2013-02-07, 23:47
+
Ted Yu 2013-02-08, 00:34
+
Michael Ellery 2013-02-08, 01:02
+
Ted Yu 2013-02-08, 01:09
+
Michael Ellery 2013-02-08, 04:34
+
Marcos Ortiz 2013-02-08, 05:38
+
Dave Wang 2013-02-08, 16:58
+
Marcos Ortiz Valmaseda 2013-02-08, 01:08
+
Asaf Mesika 2013-02-08, 16:25
Copy link to this message
-
Re: column count guidelines
The reason I mentioned 0.94.4 was that it is the most recent 0.94 release.

For the features, you can refer to the following JIRAs:
HBASE-4465 Lazy-seek optimization for StoreFile scanners
HBASE-4218 Data Block Encoding of KeyValues  (aka delta encoding / prefix
compression)

Cheers

On Fri, Feb 8, 2013 at 8:25 AM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> Can you elaborate more on that features? I thought 4 was just for bug
> fixes.
>
> Sent from my iPhone
>
> On 8 בפבר 2013, at 02:34, Ted Yu <[EMAIL PROTECTED]> wrote:
>
> How many column families are involved ?
>
> Have you considered upgrading to 0.94.4 where you would be able to benefit
> from lazy seek, Data Block Encoding, etc ?
>
> Thanks
>
> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[EMAIL PROTECTED]>
> wrote:
>
> I'm looking for some advice about per row CQ (column qualifier) count
>
> guidelines. Our current schema design means we have a HIGHLY variable CQ
>
> count per row -- some rows have one or two CQs and some rows have upwards
>
> of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and
>
> the cell values are null.  We see highly variable and too often
>
> unacceptable read performance using this schema.  I don't know for a fact
>
> that the CQ count variability is the source of our problems, but I am
>
> suspicious.
>
>
> I'm curious about others' experience with CQ counts per row -- are there
>
> some best practices/guidelines about how to optimally size the number of
>
> CQs per row. The other obvious solution will involve breaking this data
>
> into finer grained rows, which means shifting from GETs to SCANs - are
>
> there performance trade-offs in such a change?
>
>
> We are currently using CDH3u4, if that is relevant. All of our loading is
>
> done via HFILE loading (bulk), so we have not had to tune write performance
>
> beyond using bulk loads. Any advice appreciated, including what metrics we
>
> should be looking at to further diagnose our read performance challenges.
>
>
> Thanks,
>
> Mike Ellery
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB