Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> 2 differents hbase.hregion.max.filesize at the same time?


+
Jean-Marc Spaggiari 2012-11-19, 12:29
+
Kevin Odell 2012-11-19, 13:27
+
Jean-Marc Spaggiari 2012-11-19, 21:35
Copy link to this message
-
Re: 2 differents hbase.hregion.max.filesize at the same time?
Ok. I ran minor and major compaction, and it has split the table. I
now have many regions. That's perfect! I still think a MIN_REGIONS
option might be usefull or somethink like EVENLY_SPLIT. But at least I
can adjust my settings with MAX_FILESIZE.

Thanks,

JM

2012/11/19, Jean-Marc Spaggiari <[EMAIL PROTECTED]>:
> Hi Kevin,
>
> Thanks for the suggestion.
>
> I have disabled the table, setup the MAX_FILESIZE value and enabled the
> table.
>
> I can see that in the UI:
>
> work_proposed {NAME => 'work_proposed', MAX_FILESIZE => '104857600',
> FAMILIES => [{NAME => '@'}]}
>
> But there is still only one region into the table.
>
> 104857600 is 100MB
>
> And here are the files in hadoop:
> hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls
> /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@
> Found 2 items
> -rw-r--r--   3 hbase supergroup 1340467822 2012-11-19 16:06
> /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@/157867160e684800946dd129900d3f77
> -rw-r--r--   3 hbase supergroup  834894008 2012-11-19 16:06
> /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@/72bb17a94dc946da8db5841a37463713
>
> The smallest one is almost 800MB.
>
> Something which might be interesting also will be to have something
> like "MIN_REGIONS" where you can setup a number of minimum regions you
> want for this table, whithout any consideration of the side of the
> file. The goal here is to make sure the table is spread over enought
> servers to distribut the work when there is major MapReduce jobs
> running... Here, I have a 800MB file, and 8 region servers. I will
> setup the MIN_REGIONS value to 8 and let hbase make sure there is at
> least 8 regions for this table....
>
> JM
>
> 2012/11/19, Kevin O'dell <[EMAIL PROTECTED]>:
>> JM,
>>
>>   You can go into the shell -> disable table -> alter table command and
>> chance MAX_FILESIZE(I think that is what it is) this will set it at a per
>> table basis.
>>
>> On Mon, Nov 19, 2012 at 4:29 AM, Jean-Marc Spaggiari <
>> [EMAIL PROTECTED]> wrote:
>>
>>> Hi,
>>>
>>> I have a 400M lines table that I merged yesterday into a single
>>> region. I have previously splitted it wrongly. So I would like HBase
>>> to split it its way.
>>>
>>> The issue is that keys are very small in this table and the 400M table
>>> is stored on a <10G HFile.
>>>
>>> I still can use the split option on the HTML interface, but I was
>>> wondering if there was a way to tell to hbase that the max filesize
>>> for this specific table is 1G, but remains 10G for the other tables?
>>>
>>> My goal is to split this table into at least 8 pieces. So worst case,
>>> since I know the number of lines, I can "simply" look at x/8 lines,
>>> note the key, and continue. Then do the split. But is there a more
>>> "automatic" way to do it?
>>>
>>> Thanks,
>>>
>>> JM
>>>
>>
>>
>>
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
>>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB