Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - 2 differents hbase.hregion.max.filesize at the same time?


+
Jean-Marc Spaggiari 2012-11-19, 12:29
+
Kevin Odell 2012-11-19, 13:27
+
Jean-Marc Spaggiari 2012-11-19, 21:35
Copy link to this message
-
Re: 2 differents hbase.hregion.max.filesize at the same time?
Jean-Marc Spaggiari 2012-11-19, 21:47
Ok. I ran minor and major compaction, and it has split the table. I
now have many regions. That's perfect! I still think a MIN_REGIONS
option might be usefull or somethink like EVENLY_SPLIT. But at least I
can adjust my settings with MAX_FILESIZE.

Thanks,

JM

2012/11/19, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> Hi Kevin,
>
> Thanks for the suggestion.
>
> I have disabled the table, setup the MAX_FILESIZE value and enabled the
> table.
>
> I can see that in the UI:
>
> work_proposed {NAME => 'work_proposed', MAX_FILESIZE => '104857600',
> FAMILIES => [{NAME => '@'}]}
>
> But there is still only one region into the table.
>
> 104857600 is 100MB
>
> And here are the files in hadoop:
> hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
> /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@
> Found 2 items
> -rw-r--r--   3 hbase supergroup 1340467822 2012-11-19 16:06
> /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@/157867160e684800946dd129900d3f77
> -rw-r--r--   3 hbase supergroup  834894008 2012-11-19 16:06
> /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@/72bb17a94dc946da8db5841a37463713
>
> The smallest one is almost 800MB.
>
> Something which might be interesting also will be to have something
> like "MIN_REGIONS" where you can setup a number of minimum regions you
> want for this table, whithout any consideration of the side of the
> file. The goal here is to make sure the table is spread over enought
> servers to distribut the work when there is major MapReduce jobs
> running... Here, I have a 800MB file, and 8 region servers. I will
> setup the MIN_REGIONS value to 8 and let hbase make sure there is at
> least 8 regions for this table....
>
> JM
>
> 2012/11/19, Kevin O'dell <[EMAIL PROTECTED]>:
>> JM,
>>
>>   You can go into the shell -> disable table -> alter table command and
>> chance MAX_FILESIZE(I think that is what it is) this will set it at a per
>> table basis.
>>
>> On Mon, Nov 19, 2012 at 4:29 AM, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> I have a 400M lines table that I merged yesterday into a single
>>> region. I have previously splitted it wrongly. So I would like HBase
>>> to split it its way.
>>>
>>> The issue is that keys are very small in this table and the 400M table
>>> is stored on a <10G HFile.
>>>
>>> I still can use the split option on the HTML interface, but I was
>>> wondering if there was a way to tell to hbase that the max filesize
>>> for this specific table is 1G, but remains 10G for the other tables?
>>>
>>> My goal is to split this table into at least 8 pieces. So worst case,
>>> since I know the number of lines, I can "simply" look at x/8 lines,
>>> note the key, and continue. Then do the split. But is there a more
>>> "automatic" way to do it?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>
>>
>>
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
>>
>