Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Does hbase.hregion.max.filesize have a limit?


Copy link to this message
-
Re: Does hbase.hregion.max.filesize have a limit?
There are two trains of thought here.  The first is manually splitting your
own regions.  In this case you would not want your regions over 20GB for
HFilev2 or 4GB for HFilev1, but you would set your maxfile size to
something like 100GB so you can split when you want to and the system won't
automagically do it for you.  The second is letting HBase handle this for
you.  In which case you still would not want your max filesize over 20GB
for HFilev2 or 4GB for HFilev1, and then HBase would handle your splits(if
this seems redundant sorry).

On Thu, Nov 1, 2012 at 8:26 AM, Doug Meil <[EMAIL PROTECTED]>wrote:

>
> Hi there-
>
> re:  "The max file size the whole cluster can store for one CF is 60G,
> right?"
>
> No, the max file-size for a region, in your example, is 60GB.  When the
> data exceeds that the region will split - and then you'll have 2 regions
> with 60GB limit.
>
> Check out this section of the RefGuide:
>
> http://hbase.apache.org/book.html#regions.arch
>
> Which explains how regions are how data is distributed across your cluster.
>
> The trick is that you don't want regions to small, but you also don't want
> them too big - because you'll wind up with what the ref guide describes in
> this chapter...
>
>
> 9.7.1. Region Size
>
> HBase scales by having regions across many servers. Thus if
>           you have 2 regions for 16GB data, on a 20 node machine your data
>           will be concentrated on just a few machines - nearly the entire
>           cluster will be idle.  This really cant be stressed enough,
> since a
>           common problem is loading 200MB data into HBase then wondering
> why
>           your awesome 10 node cluster isn't doing anything.
>
>
>
>
>
> On 11/1/12 4:09 AM, "Cheng Su" <[EMAIL PROTECTED]> wrote:
>
> >Thank you for your answer.
> >The max file size the whole cluster can store for one CF is 60G, right?
> >Maybe the only way is to split the large table into small tables...
> >
> >On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan
> ><[EMAIL PROTECTED]> wrote:
> >> Can multiple region servers runs on one real machine?
> >> (I guess not though)
> >> No.. Every RS runs in different physical machines.
> >>
> >> max.file.size actually applies for region.  Suppose you create a table
> >>then
> >> insert data for 20G that will get explicitly splitted into further
> >>regions.
> >> Yes all 60G of data can be stored in one physical machine but that means
> >> that you have the data is logically served by 3 regions.
> >> Does this help you?
> >>
> >> Regards
> >> Ram
> >>
> >> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <[EMAIL PROTECTED]> wrote:
> >>
> >>> Does that means the max file size of 1 cf is 20G? If I have 3 region
> >>> servers, then 60G total?
> >>> I have a very large table, size of one cf (contains only one column)
> >>> may exceed 60G.
> >>> Is there any chance to store the data without increase machines?
> >>>
> >>> Can multiple region servers runs on one real machine?
> >>> (I guess not though)
> >>>
> >>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]>
> >>>wrote:
> >>> > The tribal knowledge would say about 20G is the max.
> >>> > The fellas from Facebook will have a more definite answer.
> >>> >
> >>> > -- Lars
> >>> >
> >>> >
> >>> >
> >>> > ________________________________
> >>> >  From: Cheng Su <[EMAIL PROTECTED]>
> >>> > To: [EMAIL PROTECTED]
> >>> > Sent: Wednesday, October 31, 2012 10:22 PM
> >>> > Subject: Does hbase.hregion.max.filesize have a limit?
> >>> >
> >>> > Hi, all.
> >>> >
> >>> > I have a simple question: does hbase.hregion.max.filesize have a
> >>>limit?
> >>> > May I specify a very large value to this? like 40G or more? (don't
> >>> > consider the performance)
> >>> > I didn't find any description about this from official site or
> >>>google.
> >>> >
> >>> > Thanks.
> >>> >
> >>> > --
> >>> >
> >>> > Regards,
> >>> > Cheng Su
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Regards,
> >>> Cheng Su
> >>>
>

Kevin O'Dell
Customer Operations Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB