Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> column count guidelines


+
Michael Ellery 2013-02-07, 23:47
+
Ted Yu 2013-02-08, 00:34
+
Michael Ellery 2013-02-08, 01:02
Copy link to this message
-
Re: column count guidelines
Thanks Michael for this information.

FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the two
features I cited below.

On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery <[EMAIL PROTECTED]> wrote:

> There is only one CF in this schema.
>
> Yes, we are looking at upgrading to CDH4, but it is not trivial since we
> cannot have cluster downtime. Our current upgrade plans involves additional
> hardware with side-by side clusters until everything is exported/imported.
>
> Thanks,
> Mike
>
> On Feb 7, 2013, at 4:34 PM, Ted Yu wrote:
>
> > How many column families are involved ?
> >
> > Have you considered upgrading to 0.94.4 where you would be able to
> benefit
> > from lazy seek, Data Block Encoding, etc ?
> >
> > Thanks
> >
> > On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[EMAIL PROTECTED]>
> wrote:
> >
> >> I'm looking for some advice about per row CQ (column qualifier) count
> >> guidelines. Our current schema design means we have a HIGHLY variable CQ
> >> count per row -- some rows have one or two CQs and some rows have
> upwards
> >> of 1 million. Each CQ is on the order of 100 bytes (for round numbers)
> and
> >> the cell values are null.  We see highly variable and too often
> >> unacceptable read performance using this schema.  I don't know for a
> fact
> >> that the CQ count variability is the source of our problems, but I am
> >> suspicious.
> >>
> >> I'm curious about others' experience with CQ counts per row -- are there
> >> some best practices/guidelines about how to optimally size the number of
> >> CQs per row. The other obvious solution will involve breaking this data
> >> into finer grained rows, which means shifting from GETs to SCANs - are
> >> there performance trade-offs in such a change?
> >>
> >> We are currently using CDH3u4, if that is relevant. All of our loading
> is
> >> done via HFILE loading (bulk), so we have not had to tune write
> performance
> >> beyond using bulk loads. Any advice appreciated, including what metrics
> we
> >> should be looking at to further diagnose our read performance
> challenges.
> >>
> >> Thanks,
> >> Mike Ellery
>
>
+
Michael Ellery 2013-02-08, 04:34
+
Marcos Ortiz 2013-02-08, 05:38
+
Dave Wang 2013-02-08, 16:58
+
Marcos Ortiz Valmaseda 2013-02-08, 01:08
+
Asaf Mesika 2013-02-08, 16:25
+
Ted Yu 2013-02-08, 17:50