Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase, mail # user - Does hbase.hregion.max.filesize have a limit?


+
Cheng Su 2012-11-01, 05:22
+
lars hofhansl 2012-11-01, 05:35
+
Cheng Su 2012-11-01, 06:45
+
Jeremy Carroll 2012-11-01, 16:39
+
Cheng Su 2012-11-02, 01:20
+
ramkrishna vasudevan 2012-11-01, 07:05
+
Cheng Su 2012-11-01, 08:09
Copy link to this message
-
Re: Does hbase.hregion.max.filesize have a limit?
Doug Meil 2012-11-01, 13:26

Hi there-

re:  "The max file size the whole cluster can store for one CF is 60G,
right?"

No, the max file-size for a region, in your example, is 60GB.  When the
data exceeds that the region will split - and then you'll have 2 regions
with 60GB limit.  

Check out this section of the RefGuide:

http://hbase.apache.org/book.html#regions.arch

Which explains how regions are how data is distributed across your cluster.

The trick is that you don't want regions to small, but you also don't want
them too big - because you'll wind up with what the ref guide describes in
this chapter...
9.7.1. Region Size

HBase scales by having regions across many servers. Thus if
          you have 2 regions for 16GB data, on a 20 node machine your data
          will be concentrated on just a few machines - nearly the entire
          cluster will be idle.  This really cant be stressed enough,
since a
          common problem is loading 200MB data into HBase then wondering
why
          your awesome 10 node cluster isn't doing anything.

On 11/1/12 4:09 AM, "Cheng Su" <[EMAIL PROTECTED]> wrote:

>Thank you for your answer.
>The max file size the whole cluster can store for one CF is 60G, right?
>Maybe the only way is to split the large table into small tables...
>
>On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan
><[EMAIL PROTECTED]> wrote:
>> Can multiple region servers runs on one real machine?
>> (I guess not though)
>> No.. Every RS runs in different physical machines.
>>
>> max.file.size actually applies for region.  Suppose you create a table
>>then
>> insert data for 20G that will get explicitly splitted into further
>>regions.
>> Yes all 60G of data can be stored in one physical machine but that means
>> that you have the data is logically served by 3 regions.
>> Does this help you?
>>
>> Regards
>> Ram
>>
>> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <[EMAIL PROTECTED]> wrote:
>>
>>> Does that means the max file size of 1 cf is 20G? If I have 3 region
>>> servers, then 60G total?
>>> I have a very large table, size of one cf (contains only one column)
>>> may exceed 60G.
>>> Is there any chance to store the data without increase machines?
>>>
>>> Can multiple region servers runs on one real machine?
>>> (I guess not though)
>>>
>>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]>
>>>wrote:
>>> > The tribal knowledge would say about 20G is the max.
>>> > The fellas from Facebook will have a more definite answer.
>>> >
>>> > -- Lars
>>> >
>>> >
>>> >
>>> > ________________________________
>>> >  From: Cheng Su <[EMAIL PROTECTED]>
>>> > To: [EMAIL PROTECTED]
>>> > Sent: Wednesday, October 31, 2012 10:22 PM
>>> > Subject: Does hbase.hregion.max.filesize have a limit?
>>> >
>>> > Hi, all.
>>> >
>>> > I have a simple question: does hbase.hregion.max.filesize have a
>>>limit?
>>> > May I specify a very large value to this? like 40G or more? (don't
>>> > consider the performance)
>>> > I didn't find any description about this from official site or
>>>google.
>>> >
>>> > Thanks.
>>> >
>>> > --
>>> >
>>> > Regards,
>>> > Cheng Su
>>>
>>>
>>>
>>> --
>>>
>>> Regards,
>>> Cheng Su
>>>
>
>
>
>--
>
>Regards,
>Cheng Su
>
+
Kevin Odell 2012-11-01, 13:53