|
Cheng Su
2012-11-01, 05:22
lars hofhansl
2012-11-01, 05:35
Cheng Su
2012-11-01, 06:45
ramkrishna vasudevan
2012-11-01, 07:05
Cheng Su
2012-11-01, 08:09
Doug Meil
2012-11-01, 13:26
Kevin O'dell
2012-11-01, 13:53
Jeremy Carroll
2012-11-01, 16:39
Cheng Su
2012-11-02, 01:20
|
-
Does hbase.hregion.max.filesize have a limit?Cheng Su 2012-11-01, 05:22
Hi, all.
I have a simple question: does hbase.hregion.max.filesize have a limit? May I specify a very large value to this? like 40G or more? (don't consider the performance) I didn't find any description about this from official site or google. Thanks. -- Regards, Cheng Su
-
Re: Does hbase.hregion.max.filesize have a limit?lars hofhansl 2012-11-01, 05:35
The tribal knowledge would say about 20G is the max.
The fellas from Facebook will have a more definite answer. -- Lars ________________________________ From: Cheng Su <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Sent: Wednesday, October 31, 2012 10:22 PM Subject: Does hbase.hregion.max.filesize have a limit? Hi, all. I have a simple question: does hbase.hregion.max.filesize have a limit? May I specify a very large value to this? like 40G or more? (don't consider the performance) I didn't find any description about this from official site or google. Thanks. -- Regards, Cheng Su
-
Re: Does hbase.hregion.max.filesize have a limit?Cheng Su 2012-11-01, 06:45
Does that means the max file size of 1 cf is 20G? If I have 3 region
servers, then 60G total? I have a very large table, size of one cf (contains only one column) may exceed 60G. Is there any chance to store the data without increase machines? Can multiple region servers runs on one real machine? (I guess not though) On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > The tribal knowledge would say about 20G is the max. > The fellas from Facebook will have a more definite answer. > > -- Lars > > > > ________________________________ > From: Cheng Su <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Wednesday, October 31, 2012 10:22 PM > Subject: Does hbase.hregion.max.filesize have a limit? > > Hi, all. > > I have a simple question: does hbase.hregion.max.filesize have a limit? > May I specify a very large value to this? like 40G or more? (don't > consider the performance) > I didn't find any description about this from official site or google. > > Thanks. > > -- > > Regards, > Cheng Su -- Regards, Cheng Su
-
Re: Does hbase.hregion.max.filesize have a limit?ramkrishna vasudevan 2012-11-01, 07:05
Can multiple region servers runs on one real machine?
(I guess not though) No.. Every RS runs in different physical machines. max.file.size actually applies for region. Suppose you create a table then insert data for 20G that will get explicitly splitted into further regions. Yes all 60G of data can be stored in one physical machine but that means that you have the data is logically served by 3 regions. Does this help you? Regards Ram On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <[EMAIL PROTECTED]> wrote: > Does that means the max file size of 1 cf is 20G? If I have 3 region > servers, then 60G total? > I have a very large table, size of one cf (contains only one column) > may exceed 60G. > Is there any chance to store the data without increase machines? > > Can multiple region servers runs on one real machine? > (I guess not though) > > On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > The tribal knowledge would say about 20G is the max. > > The fellas from Facebook will have a more definite answer. > > > > -- Lars > > > > > > > > ________________________________ > > From: Cheng Su <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Wednesday, October 31, 2012 10:22 PM > > Subject: Does hbase.hregion.max.filesize have a limit? > > > > Hi, all. > > > > I have a simple question: does hbase.hregion.max.filesize have a limit? > > May I specify a very large value to this? like 40G or more? (don't > > consider the performance) > > I didn't find any description about this from official site or google. > > > > Thanks. > > > > -- > > > > Regards, > > Cheng Su > > > > -- > > Regards, > Cheng Su >
-
Re: Does hbase.hregion.max.filesize have a limit?Cheng Su 2012-11-01, 08:09
Thank you for your answer.
The max file size the whole cluster can store for one CF is 60G, right? Maybe the only way is to split the large table into small tables... On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan <[EMAIL PROTECTED]> wrote: > Can multiple region servers runs on one real machine? > (I guess not though) > No.. Every RS runs in different physical machines. > > max.file.size actually applies for region. Suppose you create a table then > insert data for 20G that will get explicitly splitted into further regions. > Yes all 60G of data can be stored in one physical machine but that means > that you have the data is logically served by 3 regions. > Does this help you? > > Regards > Ram > > On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <[EMAIL PROTECTED]> wrote: > >> Does that means the max file size of 1 cf is 20G? If I have 3 region >> servers, then 60G total? >> I have a very large table, size of one cf (contains only one column) >> may exceed 60G. >> Is there any chance to store the data without increase machines? >> >> Can multiple region servers runs on one real machine? >> (I guess not though) >> >> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: >> > The tribal knowledge would say about 20G is the max. >> > The fellas from Facebook will have a more definite answer. >> > >> > -- Lars >> > >> > >> > >> > ________________________________ >> > From: Cheng Su <[EMAIL PROTECTED]> >> > To: [EMAIL PROTECTED] >> > Sent: Wednesday, October 31, 2012 10:22 PM >> > Subject: Does hbase.hregion.max.filesize have a limit? >> > >> > Hi, all. >> > >> > I have a simple question: does hbase.hregion.max.filesize have a limit? >> > May I specify a very large value to this? like 40G or more? (don't >> > consider the performance) >> > I didn't find any description about this from official site or google. >> > >> > Thanks. >> > >> > -- >> > >> > Regards, >> > Cheng Su >> >> >> >> -- >> >> Regards, >> Cheng Su >> -- Regards, Cheng Su
-
Re: Does hbase.hregion.max.filesize have a limit?Doug Meil 2012-11-01, 13:26
Hi there- re: "The max file size the whole cluster can store for one CF is 60G, right?" No, the max file-size for a region, in your example, is 60GB. When the data exceeds that the region will split - and then you'll have 2 regions with 60GB limit. Check out this section of the RefGuide: http://hbase.apache.org/book.html#regions.arch Which explains how regions are how data is distributed across your cluster. The trick is that you don't want regions to small, but you also don't want them too big - because you'll wind up with what the ref guide describes in this chapter... 9.7.1. Region Size HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines - nearly the entire cluster will be idle. This really cant be stressed enough, since a common problem is loading 200MB data into HBase then wondering why your awesome 10 node cluster isn't doing anything. On 11/1/12 4:09 AM, "Cheng Su" <[EMAIL PROTECTED]> wrote: >Thank you for your answer. >The max file size the whole cluster can store for one CF is 60G, right? >Maybe the only way is to split the large table into small tables... > >On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan ><[EMAIL PROTECTED]> wrote: >> Can multiple region servers runs on one real machine? >> (I guess not though) >> No.. Every RS runs in different physical machines. >> >> max.file.size actually applies for region. Suppose you create a table >>then >> insert data for 20G that will get explicitly splitted into further >>regions. >> Yes all 60G of data can be stored in one physical machine but that means >> that you have the data is logically served by 3 regions. >> Does this help you? >> >> Regards >> Ram >> >> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <[EMAIL PROTECTED]> wrote: >> >>> Does that means the max file size of 1 cf is 20G? If I have 3 region >>> servers, then 60G total? >>> I have a very large table, size of one cf (contains only one column) >>> may exceed 60G. >>> Is there any chance to store the data without increase machines? >>> >>> Can multiple region servers runs on one real machine? >>> (I guess not though) >>> >>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> >>>wrote: >>> > The tribal knowledge would say about 20G is the max. >>> > The fellas from Facebook will have a more definite answer. >>> > >>> > -- Lars >>> > >>> > >>> > >>> > ________________________________ >>> > From: Cheng Su <[EMAIL PROTECTED]> >>> > To: [EMAIL PROTECTED] >>> > Sent: Wednesday, October 31, 2012 10:22 PM >>> > Subject: Does hbase.hregion.max.filesize have a limit? >>> > >>> > Hi, all. >>> > >>> > I have a simple question: does hbase.hregion.max.filesize have a >>>limit? >>> > May I specify a very large value to this? like 40G or more? (don't >>> > consider the performance) >>> > I didn't find any description about this from official site or >>>google. >>> > >>> > Thanks. >>> > >>> > -- >>> > >>> > Regards, >>> > Cheng Su >>> >>> >>> >>> -- >>> >>> Regards, >>> Cheng Su >>> > > > >-- > >Regards, >Cheng Su >
-
Re: Does hbase.hregion.max.filesize have a limit?Kevin O'dell 2012-11-01, 13:53
There are two trains of thought here. The first is manually splitting your
own regions. In this case you would not want your regions over 20GB for HFilev2 or 4GB for HFilev1, but you would set your maxfile size to something like 100GB so you can split when you want to and the system won't automagically do it for you. The second is letting HBase handle this for you. In which case you still would not want your max filesize over 20GB for HFilev2 or 4GB for HFilev1, and then HBase would handle your splits(if this seems redundant sorry). On Thu, Nov 1, 2012 at 8:26 AM, Doug Meil <[EMAIL PROTECTED]>wrote: > > Hi there- > > re: "The max file size the whole cluster can store for one CF is 60G, > right?" > > No, the max file-size for a region, in your example, is 60GB. When the > data exceeds that the region will split - and then you'll have 2 regions > with 60GB limit. > > Check out this section of the RefGuide: > > http://hbase.apache.org/book.html#regions.arch > > Which explains how regions are how data is distributed across your cluster. > > The trick is that you don't want regions to small, but you also don't want > them too big - because you'll wind up with what the ref guide describes in > this chapter... > > > 9.7.1. Region Size > > HBase scales by having regions across many servers. Thus if > you have 2 regions for 16GB data, on a 20 node machine your data > will be concentrated on just a few machines - nearly the entire > cluster will be idle. This really cant be stressed enough, > since a > common problem is loading 200MB data into HBase then wondering > why > your awesome 10 node cluster isn't doing anything. > > > > > > On 11/1/12 4:09 AM, "Cheng Su" <[EMAIL PROTECTED]> wrote: > > >Thank you for your answer. > >The max file size the whole cluster can store for one CF is 60G, right? > >Maybe the only way is to split the large table into small tables... > > > >On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan > ><[EMAIL PROTECTED]> wrote: > >> Can multiple region servers runs on one real machine? > >> (I guess not though) > >> No.. Every RS runs in different physical machines. > >> > >> max.file.size actually applies for region. Suppose you create a table > >>then > >> insert data for 20G that will get explicitly splitted into further > >>regions. > >> Yes all 60G of data can be stored in one physical machine but that means > >> that you have the data is logically served by 3 regions. > >> Does this help you? > >> > >> Regards > >> Ram > >> > >> On Thu, Nov 1, 2012 at 12:15 PM, Cheng Su <[EMAIL PROTECTED]> wrote: > >> > >>> Does that means the max file size of 1 cf is 20G? If I have 3 region > >>> servers, then 60G total? > >>> I have a very large table, size of one cf (contains only one column) > >>> may exceed 60G. > >>> Is there any chance to store the data without increase machines? > >>> > >>> Can multiple region servers runs on one real machine? > >>> (I guess not though) > >>> > >>> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> > >>>wrote: > >>> > The tribal knowledge would say about 20G is the max. > >>> > The fellas from Facebook will have a more definite answer. > >>> > > >>> > -- Lars > >>> > > >>> > > >>> > > >>> > ________________________________ > >>> > From: Cheng Su <[EMAIL PROTECTED]> > >>> > To: [EMAIL PROTECTED] > >>> > Sent: Wednesday, October 31, 2012 10:22 PM > >>> > Subject: Does hbase.hregion.max.filesize have a limit? > >>> > > >>> > Hi, all. > >>> > > >>> > I have a simple question: does hbase.hregion.max.filesize have a > >>>limit? > >>> > May I specify a very large value to this? like 40G or more? (don't > >>> > consider the performance) > >>> > I didn't find any description about this from official site or > >>>google. > >>> > > >>> > Thanks. > >>> > > >>> > -- > >>> > > >>> > Regards, > >>> > Cheng Su > >>> > >>> > >>> > >>> -- > >>> > >>> Regards, > >>> Cheng Su > >>> > Kevin O'Dell Customer Operations Engineer, Cloudera
-
Re: Does hbase.hregion.max.filesize have a limit?Jeremy Carroll 2012-11-01, 16:39
To speak to 'if it's possible', yes it is. We have some tables over here at
Klout during testing where we set the max region size to 100Gb, and actually had tables of that size during a MR job that created HFileV2's for us to import. So I can say that I have seen 100Gb regions that still work. As to if this is a good idea, it's probably not. As a capacity planning exercise we added additional nodes to the cluster, and split these regions down to 10-20Gb in size. On Wed, Oct 31, 2012 at 11:45 PM, Cheng Su <[EMAIL PROTECTED]> wrote: > Does that means the max file size of 1 cf is 20G? If I have 3 region > servers, then 60G total? > I have a very large table, size of one cf (contains only one column) > may exceed 60G. > Is there any chance to store the data without increase machines? > > Can multiple region servers runs on one real machine? > (I guess not though) > > On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > The tribal knowledge would say about 20G is the max. > > The fellas from Facebook will have a more definite answer. > > > > -- Lars > > > > > > > > ________________________________ > > From: Cheng Su <[EMAIL PROTECTED]> > > To: [EMAIL PROTECTED] > > Sent: Wednesday, October 31, 2012 10:22 PM > > Subject: Does hbase.hregion.max.filesize have a limit? > > > > Hi, all. > > > > I have a simple question: does hbase.hregion.max.filesize have a limit? > > May I specify a very large value to this? like 40G or more? (don't > > consider the performance) > > I didn't find any description about this from official site or google. > > > > Thanks. > > > > -- > > > > Regards, > > Cheng Su > > > > -- > > Regards, > Cheng Su >
-
Re: Does hbase.hregion.max.filesize have a limit?Cheng Su 2012-11-02, 01:20
Thank you all guys.
I found out that I misunderstood the "size of a region" and "size of a region server". I found this property 193- <property> 194- <name>hbase.regionserver.regionSplitLimit</name> 195- <value>2147483647</value> 196- <description>Limit for the number of regions after which no more region 197: splitting should take place. This is not a hard limit for the number of 198: regions but acts as a guideline for the regionserver to stop splitting after 199: a certain limit. Default is set to MAX_INT; i.e. do not block splitting. 200- </description> 201- </property> So in practice, a region server can handle enough regions, so I don't need worry about the store size. Thank you all again. On Fri, Nov 2, 2012 at 12:39 AM, Jeremy Carroll <[EMAIL PROTECTED]> wrote: > To speak to 'if it's possible', yes it is. We have some tables over here at > Klout during testing where we set the max region size to 100Gb, and > actually had tables of that size during a MR job that created HFileV2's for > us to import. So I can say that I have seen 100Gb regions that still work. > > As to if this is a good idea, it's probably not. As a capacity planning > exercise we added additional nodes to the cluster, and split these regions > down to 10-20Gb in size. > > On Wed, Oct 31, 2012 at 11:45 PM, Cheng Su <[EMAIL PROTECTED]> wrote: > >> Does that means the max file size of 1 cf is 20G? If I have 3 region >> servers, then 60G total? >> I have a very large table, size of one cf (contains only one column) >> may exceed 60G. >> Is there any chance to store the data without increase machines? >> >> Can multiple region servers runs on one real machine? >> (I guess not though) >> >> On Thu, Nov 1, 2012 at 1:35 PM, lars hofhansl <[EMAIL PROTECTED]> wrote: >> > The tribal knowledge would say about 20G is the max. >> > The fellas from Facebook will have a more definite answer. >> > >> > -- Lars >> > >> > >> > >> > ________________________________ >> > From: Cheng Su <[EMAIL PROTECTED]> >> > To: [EMAIL PROTECTED] >> > Sent: Wednesday, October 31, 2012 10:22 PM >> > Subject: Does hbase.hregion.max.filesize have a limit? >> > >> > Hi, all. >> > >> > I have a simple question: does hbase.hregion.max.filesize have a limit? >> > May I specify a very large value to this? like 40G or more? (don't >> > consider the performance) >> > I didn't find any description about this from official site or google. >> > >> > Thanks. >> > >> > -- >> > >> > Regards, >> > Cheng Su >> >> >> >> -- >> >> Regards, >> Cheng Su >> -- Regards, Cheng Su |