Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Multiple tables vs big fat table


Copy link to this message
-
Re: Multiple tables vs big fat table
Amandeep Khurana 2011-11-21, 01:36
Mark,

This is an interesting discussion and like Michel said - the answer to your
question depends on what you are trying to achieve. However, here are the
points that I would think about:

What are the access patters of the various buckets of data that you want to
put in HBase? For instance, would the SearchLog and PageViewLog tables be
access together all the time? Would they be primarily scanned or just
random look ups. What are the cache requirements? Are both going to be
equally read and written? Ideally, you want to store data with separate
access patterns in separate tables.

Then, what kind of schema are you looking at. When I say schema, I mean
keys and column families. Now, if you concatenate the three tables you
mentioned and let's say your keys are prefixed with the type of data:

S<id>
P<id>
L<id>

you will be using some servers more than others for different parts of the
data. In theory, that should not happen but in most practical scenarios
when splitting happens, regions tend to stick together. There are ways to
work around that as well.

Like Lars said, it's okay to have multiple tables. But you don't want to
end up 100s of tables. You ideally want to optimize for the number of
tables depending on the access patterns.

Again, this discussion will be kind of abstract without a specific example.
:)

-ak
On Fri, Nov 18, 2011 at 1:29 PM, Mark <[EMAIL PROTECTED]> wrote:

> Is it better to have many smaller tables are one larger table? For example
> if we wanted to store user action logs we could do either of the following:
>
> Multiple tables:
>  - SearchLog
>  - PageViewLog
>  - LoginLog
>
> or
>
> One table:
>  - ActionLog where the key could be a concatenation of the action type ie
> (search, pageview, login)
>
> Any ideas? Are there any performance considerations on having multiple
> smaller tables?
>
> Thanks
>
>