Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - When to expand vertically vs. horizontally in Hbase

Copy link to this message
Re: When to expand vertically vs. horizontally in Hbase
Asaf Mesika 2013-07-03, 19:42
Do you have only 5 static author names?
Keep in mind the column family name is defined when creating the table.

Regarding tall vs wide debate:
HBase is first and for most a Key Value database thus reads and writes in
the column-value level. So it doesn't really care about rows.
But it's not entirely true. Rows come into play in the following situations:
Splitting a region is per row and not per column, thus a row will be saved
as a whole on a region. If you have a really large row, the region size
granularity is dependent on it. It doesn't seem to be the case here.
Put/Delete creates a lock until finished. If you are intensive on inserts
to the same row at the same time, thus might be bad for you, keeping your
rows slimmer can reduce contention, but again, only if you make a lot
concurrent modifications to the same row.
Filtering - if you need a filter which need all the row (there is a method
you override in Filter to mark that) than a far row will be more memory
intensive. If you needed only 1/5 of your row, than maybe splitting it to 5
rows to begin with would have made a better schema design in terms of
memory and I/O.

On Wednesday, July 3, 2013, Aji Janis wrote:

> I have a major typo in the question so I apologize. I meant to say 5
> families with 1000+ qualifiers each.
> Lets work with an example, (not the greatest example here but still). Lets
> say we have a Genre Class like this:
> Class HistoryBooks{
>  ArrayList<Books> author1;
>  ArrayList<Books> author2;
>  ArrayList<Books> author3;
>  ArrayList<Books> author4;
>  ArrayList<Books> author5;
> ...}
> Each author is a column family (lets say we only allow 5 authors per
> <T>Book class. Book per author ends up being the qualifier. In this case, I
> know I have a max family count but my qualifiers have no upper limit. So is
> this scenario a case for tall or wide table? Why? Thank you.
> On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault
> <[EMAIL PROTECTED] <javascript:;>>wrote:
> > If they are accessed mostly together they should all be a single column
> > family. The key with tall or wide is based on the total byte size of each
> > KeyValue. Your cells would need to be quite large for 50 to become a
> > problem. I still would recommend using a single CF though.
> > —
> > Sent from iPhone
> >
> > On Tue, Jul 2, 2013 at 9:33 AM, Aji Janis <[EMAIL PROTECTED]<javascript:;>>
> wrote:
> >
> > > The section on Rows vs. Columns at
> > > http://hbase.apache.org/book/schema.smackdown.html talks about
> expanding
> > > horizontally vs. vertically.
> > > Can someone please explain to me when to choose rows vs. columns. The
> > > sections reads, "To be clear, this guideline is in the context is in
> > > extremely wide cases, not in the standard use-case where one needs to
> > store
> > > a few dozen or hundred columns" so if I had 5 column families with 10
> > > qualifiers each, accessed mostly together is this a case for wider or
> > > taller table? Thanks for any help in advance.
> >