Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # user - Number of column families vs Number of column family qualifiers


Copy link to this message
-
Re: Number of column families vs Number of column family qualifiers
Andrey Stepachev 2010-10-11, 16:27
2010/10/11 Jean-Daniel Cryans <[EMAIL PROTECTED]>:
> On Mon, Oct 11, 2010 at 4:20 AM, Andrey Stepachev <[EMAIL PROTECTED]> wrote:
>> Hi.
>>

Yes. I agree. OOME unlikely. I misinterpreted my current problem.
I found, that this (gc timeout) on my 0.89-stumpbleupon hbase occurs
only if writeToWAL=false. My RS eats all available memory (5GB), but
don't get OOME. I try ti figure out what is going on.
> But the "number of memstores" argument also implies that since regions
> flush on the total size of their memstores, filling up a few of them
> at the same time is very inefficient. The worst case is filling up a
> family with really big cells while also inserting much smaller cells
> into other families. In one case on a troublesome cluster I saw
> regions flushing one ~58MB file along with 5 ~100KB-1MB files.

This is my case. My design flaw was to use separate family for each
entity (which i have now 9). And i got especially what you describe.

>
> Flushing individual families instead of whole regions would be a fix
> in this case, but it has other side effects.

Hm.. How I can flush family from client side? I don't see any api in 0.20.x.
Is it 0.89 api changes? (don't dig into 0.89 yet).

>
> I personally don't recommend using multiple families unless they are
> used separately almost all the time.

Totally agree, because I stepped on this rake.

Sorry for wrong information.

Andrey.