Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> 2 differents hbase.hregion.max.filesize at the same time?

Copy link to this message
Re: 2 differents hbase.hregion.max.filesize at the same time?
Hi Kevin,

Thanks for the suggestion.

I have disabled the table, setup the MAX_FILESIZE value and enabled the table.

I can see that in the UI:

work_proposed {NAME => 'work_proposed', MAX_FILESIZE => '104857600',
FAMILIES => [{NAME => '@'}]}

But there is still only one region into the table.

104857600 is 100MB

And here are the files in hadoop:
hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
Found 2 items
-rw-r--r--   3 hbase supergroup 1340467822 2012-11-19 16:06
-rw-r--r--   3 hbase supergroup  834894008 2012-11-19 16:06

The smallest one is almost 800MB.

Something which might be interesting also will be to have something
like "MIN_REGIONS" where you can setup a number of minimum regions you
want for this table, whithout any consideration of the side of the
file. The goal here is to make sure the table is spread over enought
servers to distribut the work when there is major MapReduce jobs
running... Here, I have a 800MB file, and 8 region servers. I will
setup the MIN_REGIONS value to 8 and let hbase make sure there is at
least 8 regions for this table....


2012/11/19, Kevin O'dell <[EMAIL PROTECTED]>:
> JM,
>   You can go into the shell -> disable table -> alter table command and
> chance MAX_FILESIZE(I think that is what it is) this will set it at a per
> table basis.
> On Mon, Nov 19, 2012 at 4:29 AM, Jean-Marc Spaggiari <
>> Hi,
>> I have a 400M lines table that I merged yesterday into a single
>> region. I have previously splitted it wrongly. So I would like HBase
>> to split it its way.
>> The issue is that keys are very small in this table and the 400M table
>> is stored on a <10G HFile.
>> I still can use the split option on the HTML interface, but I was
>> wondering if there was a way to tell to hbase that the max filesize
>> for this specific table is 1G, but remains 10G for the other tables?
>> My goal is to split this table into at least 8 pieces. So worst case,
>> since I know the number of lines, I can "simply" look at x/8 lines,
>> note the key, and continue. Then do the split. But is there a more
>> "automatic" way to do it?
>> Thanks,
>> JM
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera