Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - When to expand vertically vs. horizontally in Hbase


+
Aji Janis 2013-07-02, 13:32
+
Bryan Beaudreault 2013-07-02, 13:56
+
Aji Janis 2013-07-03, 13:42
+
Asaf Mesika 2013-07-03, 19:42
+
Aji Janis 2013-07-05, 13:53
Copy link to this message
-
Re: When to expand vertically vs. horizontally in Hbase
Michael Segel 2013-07-05, 16:07
Why do you have so many column families (CF) ?

Its not a question on the physical limitations, but more on the issue of data design.

There aren't that many really good examples of where you would have multiple column families that would require more than a handful of CFs.

When I teach or lecture, the example I use is an order entry system.  Where you would have the same key on Order entry, pick slips, shipping, and invoice.

That's probably the best example of where CFs come in to play.

I'd suggest that you go back and rethink the design if you're having more than a handful.

On Jul 5, 2013, at 8:53 AM, Aji Janis <[EMAIL PROTECTED]> wrote:

> Asaf,
>
> I am using the Genre/Author stuff as an example but yes at the moment I
> only have 5 column families. However, over time I may have more (no upper
> limit decided that this point). See below for more responses
>
>
> On Wed, Jul 3, 2013 at 3:42 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:
>
>> Do you have only 5 static author names?
>> Keep in mind the column family name is defined when creating the table.
>>
>> Regarding tall vs wide debate:
>> HBase is first and for most a Key Value database thus reads and writes in
>> the column-value level. So it doesn't really care about rows.
>> But it's not entirely true. Rows come into play in the following
>> situations:
>> Splitting a region is per row and not per column, thus a row will be saved
>> as a whole on a region. If you have a really large row, the region size
>> granularity is dependent on it. It doesn't seem to be the case here.
>> Put/Delete creates a lock until finished. If you are intensive on inserts
>> to the same row at the same time, thus might be bad for you, keeping your
>> rows slimmer can reduce contention, but again, only if you make a lot
>> concurrent modifications to the same row.
>>
>
> I expect batches of Put/Delete to the same row to happen by at most one
> thread at a time based on user's current behavior. So locking shouldn't be
> an issue. However, not sure if the saving row to a region with enough space
> topic is really an issue I need to worry about (probably because I just
> don't know much about it yet).
>
>
>> Filtering - if you need a filter which need all the row (there is a method
>> you override in Filter to mark that) than a far row will be more memory
>> intensive. If you needed only 1/5 of your row, than maybe splitting it to 5
>> rows to begin with would have made a better schema design in terms of
>> memory and I/O.
>>
>
> Currently, my access pattern is to get all data for a given row. Its
> possible in the future we may want to apply (family/qualifier) filters.
> There is a lot of uncertainty on use cases (client side) at this point
> which is why I am not entirely sure on how things will look months from
> now. I am not sure I follow this statement
>
> "if you need a filter which need all the row (there is a method you
> override in Filter to mark that) than a far row will be more memory
> intensive."
>
> Can you please explain? Thank you for these suggestions btw, good food for
> thought!
>
>
>>
>> On Wednesday, July 3, 2013, Aji Janis wrote:
>>
>>> I have a major typo in the question so I apologize. I meant to say 5
>>> families with 1000+ qualifiers each.
>>>
>>> Lets work with an example, (not the greatest example here but still).
>> Lets
>>> say we have a Genre Class like this:
>>>
>>> Class HistoryBooks{
>>>
>>> ArrayList<Books> author1;
>>> ArrayList<Books> author2;
>>> ArrayList<Books> author3;
>>> ArrayList<Books> author4;
>>> ArrayList<Books> author5;
>>>
>>> ...}
>>>
>>> Each author is a column family (lets say we only allow 5 authors per
>>> <T>Book class. Book per author ends up being the qualifier. In this
>> case, I
>>> know I have a max family count but my qualifiers have no upper limit. So
>> is
>>> this scenario a case for tall or wide table? Why? Thank you.
>>>
>>>
>>> On Tue, Jul 2, 2013 at 9:56 AM, Bryan Beaudreault
>>> <[EMAIL PROTECTED] <javascript:;>>wrote:
+
Aji Janis 2013-07-05, 16:16
+
Michael Segel 2013-07-05, 17:48
+
Ian Varley 2013-07-05, 18:26
+
Michael Segel 2013-07-05, 18:41
+
Ian Varley 2013-07-05, 18:56
+
Michael Segel 2013-07-05, 21:21
+
Ian Varley 2013-07-05, 23:00
+
Michael Segel 2013-07-08, 15:27
+
Michael Segel 2013-07-03, 14:08
+
Stack 2013-07-03, 17:57
+
Michael Segel 2013-07-03, 18:02