|
William Kang
2010-09-06, 20:56
Jonathan Gray
2010-09-06, 23:10
William Kang
2010-09-07, 05:40
Himanshu Vashishtha
2010-09-07, 06:49
Himanshu Vashishtha
2010-09-07, 14:32
Andrew Purtell
2010-09-07, 18:22
William Kang
2010-09-08, 02:36
Jonathan Gray
2010-09-08, 03:30
William Kang
2010-10-13, 08:31
Sean Bigdatafun
2010-10-15, 00:23
Ryan Rawson
2010-10-15, 01:41
Sean Bigdatafun
2010-10-15, 09:20
William Kang
2010-09-08, 04:07
Ryan Rawson
2010-09-08, 04:36
|
-
Limits on HBaseWilliam Kang 2010-09-06, 20:56
Hi folks,
I know this question may have been asked many times, but I am wondering if there is any update on the optimized cell size (in megabytes) and row size (in megabytes)? Many thanks. William +
William Kang 2010-09-06, 20:56
-
RE: Limits on HBaseJonathan Gray 2010-09-06, 23:10
I'm not sure what you mean by "optimized cell size" or whether you're just asking about practical limits?
HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that level, I have seen some weird performance issues. The most important thing is to be sure to tweak all of your settings. If you have 20MB cells, you need to be sure to increase the flush size beyond 64MB and the split size beyond 256MB. You also need enough memory to support all this large object allocation. And of course, test test test. That's the easiest way to see if what you want to do will work :) When you run into problems, e-mail the list. As far as row size is concerned, the only issue is that a row can never span multiple regions so a given row can only be in one region and thus be hosted on one server at a time. JG > -----Original Message----- > From: William Kang [mailto:[EMAIL PROTECTED]] > Sent: Monday, September 06, 2010 1:57 PM > To: hbase-user > Subject: Limits on HBase > > Hi folks, > I know this question may have been asked many times, but I am wondering > if > there is any update on the optimized cell size (in megabytes) and row > size > (in megabytes)? Many thanks. > > > William +
Jonathan Gray 2010-09-06, 23:10
-
Re: Limits on HBaseWilliam Kang 2010-09-07, 05:40
Hi JG,
Thanks for your reply. As far as I have read in Hbase's documentation and wiki, the cell size is not supposed to be larger than 10 MB. For the row, I am not quite sure, but it looks like 256 MB is the upper limit. I am considering store some binary data used to be stored in RDBM blob field. The size of those binary objects may vary from hundreds of KB to hundreds of MB. What would be a good way to use Hbase for it? We really want to use hbase to avoid that scaling problem. Many thanks. William On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > I'm not sure what you mean by "optimized cell size" or whether you're just > asking about practical limits? > > HBase is generally used with cells in the range of tens of bytes to > hundreds of kilobytes. However, I have used it with cells that are several > megabytes, up to about 50MB. Up at that level, I have seen some weird > performance issues. > > The most important thing is to be sure to tweak all of your settings. If > you have 20MB cells, you need to be sure to increase the flush size beyond > 64MB and the split size beyond 256MB. You also need enough memory to > support all this large object allocation. > > And of course, test test test. That's the easiest way to see if what you > want to do will work :) > > When you run into problems, e-mail the list. > > As far as row size is concerned, the only issue is that a row can never > span multiple regions so a given row can only be in one region and thus be > hosted on one server at a time. > > JG > > > -----Original Message----- > > From: William Kang [mailto:[EMAIL PROTECTED]] > > Sent: Monday, September 06, 2010 1:57 PM > > To: hbase-user > > Subject: Limits on HBase > > > > Hi folks, > > I know this question may have been asked many times, but I am wondering > > if > > there is any update on the optimized cell size (in megabytes) and row > > size > > (in megabytes)? Many thanks. > > > > > > William > +
William Kang 2010-09-07, 05:40
-
Re: Limits on HBaseHimanshu Vashishtha 2010-09-07, 06:49
Assuming you will be using hdfs as the file system: wouldn't saving those
large objects in the fs and keeping a pointer to them in a hbase table serve the purpose. [I haven't done it myself but I can't see it not working. In fact, I remember reading it somewhere in the list.] ~Himanshu On Mon, Sep 6, 2010 at 11:40 PM, William Kang <[EMAIL PROTECTED]>wrote: > Hi JG, > Thanks for your reply. As far as I have read in Hbase's documentation and > wiki, the cell size is not supposed to be larger than 10 MB. For the row, I > am not quite sure, but it looks like 256 MB is the upper limit. I am > considering store some binary data used to be stored in RDBM blob field. > The > size of those binary objects may vary from hundreds of KB to hundreds of > MB. > What would be a good way to use Hbase for it? We really want to use hbase > to > avoid that scaling problem. > Many thanks. > > > William > > On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > I'm not sure what you mean by "optimized cell size" or whether you're > just > > asking about practical limits? > > > > HBase is generally used with cells in the range of tens of bytes to > > hundreds of kilobytes. However, I have used it with cells that are > several > > megabytes, up to about 50MB. Up at that level, I have seen some weird > > performance issues. > > > > The most important thing is to be sure to tweak all of your settings. If > > you have 20MB cells, you need to be sure to increase the flush size > beyond > > 64MB and the split size beyond 256MB. You also need enough memory to > > support all this large object allocation. > > > > And of course, test test test. That's the easiest way to see if what you > > want to do will work :) > > > > When you run into problems, e-mail the list. > > > > As far as row size is concerned, the only issue is that a row can never > > span multiple regions so a given row can only be in one region and thus > be > > hosted on one server at a time. > > > > JG > > > > > -----Original Message----- > > > From: William Kang [mailto:[EMAIL PROTECTED]] > > > Sent: Monday, September 06, 2010 1:57 PM > > > To: hbase-user > > > Subject: Limits on HBase > > > > > > Hi folks, > > > I know this question may have been asked many times, but I am wondering > > > if > > > there is any update on the optimized cell size (in megabytes) and row > > > size > > > (in megabytes)? Many thanks. > > > > > > > > > William > > > +
Himanshu Vashishtha 2010-09-07, 06:49
-
Re: Limits on HBaseHimanshu Vashishtha 2010-09-07, 14:32
but yes you will not be having different versions of those objects as they
are not stored as such in a table. So, that's the down side. In case your objects are write once read multi types, I think it should work. Let's see what others say :) ~Himanshu On Tue, Sep 7, 2010 at 12:49 AM, Himanshu Vashishtha <[EMAIL PROTECTED] > wrote: > Assuming you will be using hdfs as the file system: wouldn't saving those > large objects in the fs and keeping a pointer to them in a hbase table serve > the purpose. > > [I haven't done it myself but I can't see it not working. In fact, I > remember reading it somewhere in the list.] > > ~Himanshu > > > On Mon, Sep 6, 2010 at 11:40 PM, William Kang <[EMAIL PROTECTED]>wrote: > >> Hi JG, >> Thanks for your reply. As far as I have read in Hbase's documentation and >> wiki, the cell size is not supposed to be larger than 10 MB. For the row, >> I >> am not quite sure, but it looks like 256 MB is the upper limit. I am >> considering store some binary data used to be stored in RDBM blob field. >> The >> size of those binary objects may vary from hundreds of KB to hundreds of >> MB. >> What would be a good way to use Hbase for it? We really want to use hbase >> to >> avoid that scaling problem. >> Many thanks. >> >> >> William >> >> On Mon, Sep 6, 2010 at 7:10 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> >> > I'm not sure what you mean by "optimized cell size" or whether you're >> just >> > asking about practical limits? >> > >> > HBase is generally used with cells in the range of tens of bytes to >> > hundreds of kilobytes. However, I have used it with cells that are >> several >> > megabytes, up to about 50MB. Up at that level, I have seen some weird >> > performance issues. >> > >> > The most important thing is to be sure to tweak all of your settings. >> If >> > you have 20MB cells, you need to be sure to increase the flush size >> beyond >> > 64MB and the split size beyond 256MB. You also need enough memory to >> > support all this large object allocation. >> > >> > And of course, test test test. That's the easiest way to see if what >> you >> > want to do will work :) >> > >> > When you run into problems, e-mail the list. >> > >> > As far as row size is concerned, the only issue is that a row can never >> > span multiple regions so a given row can only be in one region and thus >> be >> > hosted on one server at a time. >> > >> > JG >> > >> > > -----Original Message----- >> > > From: William Kang [mailto:[EMAIL PROTECTED]] >> > > Sent: Monday, September 06, 2010 1:57 PM >> > > To: hbase-user >> > > Subject: Limits on HBase >> > > >> > > Hi folks, >> > > I know this question may have been asked many times, but I am >> wondering >> > > if >> > > there is any update on the optimized cell size (in megabytes) and row >> > > size >> > > (in megabytes)? Many thanks. >> > > >> > > >> > > William >> > >> > > +
Himanshu Vashishtha 2010-09-07, 14:32
-
RE: Limits on HBaseAndrew Purtell 2010-09-07, 18:22
In addition to what Jon said please be aware that if compression is specified in the table schema, it happens at the store file level -- compression happens after write I/O, before read I/O, so if you transmit a 100MB object that compresses to 30MB, the performance impact is that of 100MB, not 30MB.
I also try not to go above 50MB as largest cell size, for the same reason. I have tried storing objects larger than 100MB but this can cause out of memory issues on busy regionservers no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: > From: Jonathan Gray <[EMAIL PROTECTED]> > Subject: RE: Limits on HBase > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > Date: Monday, September 6, 2010, 4:10 PM > I'm not sure what you mean by > "optimized cell size" or whether you're just asking about > practical limits? > > HBase is generally used with cells in the range of tens of > bytes to hundreds of kilobytes. However, I have used > it with cells that are several megabytes, up to about > 50MB. Up at that level, I have seen some weird > performance issues. > > The most important thing is to be sure to tweak all of your > settings. If you have 20MB cells, you need to be sure > to increase the flush size beyond 64MB and the split size > beyond 256MB. You also need enough memory to support > all this large object allocation. > > And of course, test test test. That's the easiest way > to see if what you want to do will work :) > > When you run into problems, e-mail the list. > > As far as row size is concerned, the only issue is that a > row can never span multiple regions so a given row can only > be in one region and thus be hosted on one server at a > time. > > JG > > > -----Original Message----- > > From: William Kang [mailto:[EMAIL PROTECTED]] > > Sent: Monday, September 06, 2010 1:57 PM > > To: hbase-user > > Subject: Limits on HBase > > > > Hi folks, > > I know this question may have been asked many times, > but I am wondering > > if > > there is any update on the optimized cell size (in > megabytes) and row > > size > > (in megabytes)? Many thanks. > > > > > > William > +
Andrew Purtell 2010-09-07, 18:22
-
Re: Limits on HBaseWilliam Kang 2010-09-08, 02:36
Hi,
Thanks for your reply. How about the row size? I read that a row should not be larger than the hdfs file on region server which is 256M in default. Is it right? Many thanks. William On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > In addition to what Jon said please be aware that if compression is > specified in the table schema, it happens at the store file level -- > compression happens after write I/O, before read I/O, so if you transmit a > 100MB object that compresses to 30MB, the performance impact is that of > 100MB, not 30MB. > > I also try not to go above 50MB as largest cell size, for the same reason. > I have tried storing objects larger than 100MB but this can cause out of > memory issues on busy regionservers no matter the size of the heap. When/if > HBase RPC can send large objects in smaller chunks, this will be less of an > issue. > > Best regards, > > - Andy > > Why is this email five sentences or less? > http://five.sentenc.es/ > > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > From: Jonathan Gray <[EMAIL PROTECTED]> > > Subject: RE: Limits on HBase > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > Date: Monday, September 6, 2010, 4:10 PM > > I'm not sure what you mean by > > "optimized cell size" or whether you're just asking about > > practical limits? > > > > HBase is generally used with cells in the range of tens of > > bytes to hundreds of kilobytes. However, I have used > > it with cells that are several megabytes, up to about > > 50MB. Up at that level, I have seen some weird > > performance issues. > > > > The most important thing is to be sure to tweak all of your > > settings. If you have 20MB cells, you need to be sure > > to increase the flush size beyond 64MB and the split size > > beyond 256MB. You also need enough memory to support > > all this large object allocation. > > > > And of course, test test test. That's the easiest way > > to see if what you want to do will work :) > > > > When you run into problems, e-mail the list. > > > > As far as row size is concerned, the only issue is that a > > row can never span multiple regions so a given row can only > > be in one region and thus be hosted on one server at a > > time. > > > > JG > > > > > -----Original Message----- > > > From: William Kang [mailto:[EMAIL PROTECTED]] > > > Sent: Monday, September 06, 2010 1:57 PM > > > To: hbase-user > > > Subject: Limits on HBase > > > > > > Hi folks, > > > I know this question may have been asked many times, > > but I am wondering > > > if > > > there is any update on the optimized cell size (in > > megabytes) and row > > > size > > > (in megabytes)? Many thanks. > > > > > > > > > William > > > > > > > +
William Kang 2010-09-08, 02:36
-
RE: Limits on HBaseJonathan Gray 2010-09-08, 03:30
You can go way beyond the max region split / split size. HBase will never split the region once it is a single row, even if beyond the split size.
Also, if you're using large values, you should have region sizes much larger than the default. It's common to run with 1-2GB regions in many cases. What you may have seen are recommendations that if your cell values are approaching the default block size on HDFS (64MB), you should consider putting the data directly into HDFS rather than HBase. JG > -----Original Message----- > From: William Kang [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, September 07, 2010 7:36 PM > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: Limits on HBase > > Hi, > Thanks for your reply. How about the row size? I read that a row should > not > be larger than the hdfs file on region server which is 256M in default. > Is > it right? Many thanks. > > > William > > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > In addition to what Jon said please be aware that if compression is > > specified in the table schema, it happens at the store file level -- > > compression happens after write I/O, before read I/O, so if you > transmit a > > 100MB object that compresses to 30MB, the performance impact is that > of > > 100MB, not 30MB. > > > > I also try not to go above 50MB as largest cell size, for the same > reason. > > I have tried storing objects larger than 100MB but this can cause out > of > > memory issues on busy regionservers no matter the size of the heap. > When/if > > HBase RPC can send large objects in smaller chunks, this will be less > of an > > issue. > > > > Best regards, > > > > - Andy > > > > Why is this email five sentences or less? > > http://five.sentenc.es/ > > > > > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > > > From: Jonathan Gray <[EMAIL PROTECTED]> > > > Subject: RE: Limits on HBase > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > > Date: Monday, September 6, 2010, 4:10 PM > > > I'm not sure what you mean by > > > "optimized cell size" or whether you're just asking about > > > practical limits? > > > > > > HBase is generally used with cells in the range of tens of > > > bytes to hundreds of kilobytes. However, I have used > > > it with cells that are several megabytes, up to about > > > 50MB. Up at that level, I have seen some weird > > > performance issues. > > > > > > The most important thing is to be sure to tweak all of your > > > settings. If you have 20MB cells, you need to be sure > > > to increase the flush size beyond 64MB and the split size > > > beyond 256MB. You also need enough memory to support > > > all this large object allocation. > > > > > > And of course, test test test. That's the easiest way > > > to see if what you want to do will work :) > > > > > > When you run into problems, e-mail the list. > > > > > > As far as row size is concerned, the only issue is that a > > > row can never span multiple regions so a given row can only > > > be in one region and thus be hosted on one server at a > > > time. > > > > > > JG > > > > > > > -----Original Message----- > > > > From: William Kang [mailto:[EMAIL PROTECTED]] > > > > Sent: Monday, September 06, 2010 1:57 PM > > > > To: hbase-user > > > > Subject: Limits on HBase > > > > > > > > Hi folks, > > > > I know this question may have been asked many times, > > > but I am wondering > > > > if > > > > there is any update on the optimized cell size (in > > > megabytes) and row > > > > size > > > > (in megabytes)? Many thanks. > > > > > > > > > > > > William > > > > > > > > > > > > > +
Jonathan Gray 2010-09-08, 03:30
-
Re: Limits on HBaseWilliam Kang 2010-10-13, 08:31
So, basically, there is no limit for row as long as a single row does not go
beyond the region server's storage capacity? And why the cell size should not be larger than 20M? Dose the data block in HFile store cells or whole rows? If the data block stores the cells (qualifiers and values), where is the key point to the row in the HFile file structure or is a HFile is just for a row? I seem cannot find a direct answer to this questions from http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > You can go way beyond the max region split / split size. HBase will never > split the region once it is a single row, even if beyond the split size. > > Also, if you're using large values, you should have region sizes much > larger than the default. It's common to run with 1-2GB regions in many > cases. > > What you may have seen are recommendations that if your cell values are > approaching the default block size on HDFS (64MB), you should consider > putting the data directly into HDFS rather than HBase. > > JG > > > -----Original Message----- > > From: William Kang [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, September 07, 2010 7:36 PM > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > Subject: Re: Limits on HBase > > > > Hi, > > Thanks for your reply. How about the row size? I read that a row should > > not > > be larger than the hdfs file on region server which is 256M in default. > > Is > > it right? Many thanks. > > > > > > William > > > > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> > > wrote: > > > > > In addition to what Jon said please be aware that if compression is > > > specified in the table schema, it happens at the store file level -- > > > compression happens after write I/O, before read I/O, so if you > > transmit a > > > 100MB object that compresses to 30MB, the performance impact is that > > of > > > 100MB, not 30MB. > > > > > > I also try not to go above 50MB as largest cell size, for the same > > reason. > > > I have tried storing objects larger than 100MB but this can cause out > > of > > > memory issues on busy regionservers no matter the size of the heap. > > When/if > > > HBase RPC can send large objects in smaller chunks, this will be less > > of an > > > issue. > > > > > > Best regards, > > > > > > - Andy > > > > > > Why is this email five sentences or less? > > > http://five.sentenc.es/ > > > > > > > > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > > > > > From: Jonathan Gray <[EMAIL PROTECTED]> > > > > Subject: RE: Limits on HBase > > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > > > Date: Monday, September 6, 2010, 4:10 PM > > > > I'm not sure what you mean by > > > > "optimized cell size" or whether you're just asking about > > > > practical limits? > > > > > > > > HBase is generally used with cells in the range of tens of > > > > bytes to hundreds of kilobytes. However, I have used > > > > it with cells that are several megabytes, up to about > > > > 50MB. Up at that level, I have seen some weird > > > > performance issues. > > > > > > > > The most important thing is to be sure to tweak all of your > > > > settings. If you have 20MB cells, you need to be sure > > > > to increase the flush size beyond 64MB and the split size > > > > beyond 256MB. You also need enough memory to support > > > > all this large object allocation. > > > > > > > > And of course, test test test. That's the easiest way > > > > to see if what you want to do will work :) > > > > > > > > When you run into problems, e-mail the list. > > > > > > > > As far as row size is concerned, the only issue is that a > > > > row can never span multiple regions so a given row can only > > > > be in one region and thus be hosted on one server at a > > > > time. > > > > > > > > JG > > > > > > > > > -----Original Message----- > > > > > From: William Kang [mailto:[EMAIL PROTECTED]] +
William Kang 2010-10-13, 08:31
-
Re: Limits on HBaseSean Bigdatafun 2010-10-15, 00:23
Let me ask this question from another angle:
The first question is --- if I have millions of column in a column family in the same row, such that the sum of the key-value pairs exceeds 256MB, what will happen? example: I have a column with key of 256bytes, and the value of 2K, then let's assume (256 + timestampe size + 2056) ~=2.5k, then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in this column family at this row. Anyone has comments on the math I gave above? The second question is -- By the way, if I do not turn on the LZO, is my data also compressed (by the system)? -- if so, then the above number will increase a couple of times, but still there exists a number for the limit of how many columns I can put in a row. The third question is -- If I do turn on LZO, does that mean the value get compressed first, and then the HBase mechanism further compress the key-value pair? Thanks, Sean On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > You can go way beyond the max region split / split size. HBase will never > split the region once it is a single row, even if beyond the split size. > > Also, if you're using large values, you should have region sizes much > larger than the default. It's common to run with 1-2GB regions in many > cases. > > What you may have seen are recommendations that if your cell values are > approaching the default block size on HDFS (64MB), you should consider > putting the data directly into HDFS rather than HBase. > > JG > > > -----Original Message----- > > From: William Kang [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, September 07, 2010 7:36 PM > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > Subject: Re: Limits on HBase > > > > Hi, > > Thanks for your reply. How about the row size? I read that a row should > > not > > be larger than the hdfs file on region server which is 256M in default. > > Is > > it right? Many thanks. > > > > > > William > > > > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> > > wrote: > > > > > In addition to what Jon said please be aware that if compression is > > > specified in the table schema, it happens at the store file level -- > > > compression happens after write I/O, before read I/O, so if you > > transmit a > > > 100MB object that compresses to 30MB, the performance impact is that > > of > > > 100MB, not 30MB. > > > > > > I also try not to go above 50MB as largest cell size, for the same > > reason. > > > I have tried storing objects larger than 100MB but this can cause out > > of > > > memory issues on busy regionservers no matter the size of the heap. > > When/if > > > HBase RPC can send large objects in smaller chunks, this will be less > > of an > > > issue. > > > > > > Best regards, > > > > > > - Andy > > > > > > Why is this email five sentences or less? > > > http://five.sentenc.es/ > > > > > > > > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > > > > > From: Jonathan Gray <[EMAIL PROTECTED]> > > > > Subject: RE: Limits on HBase > > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > > > Date: Monday, September 6, 2010, 4:10 PM > > > > I'm not sure what you mean by > > > > "optimized cell size" or whether you're just asking about > > > > practical limits? > > > > > > > > HBase is generally used with cells in the range of tens of > > > > bytes to hundreds of kilobytes. However, I have used > > > > it with cells that are several megabytes, up to about > > > > 50MB. Up at that level, I have seen some weird > > > > performance issues. > > > > > > > > The most important thing is to be sure to tweak all of your > > > > settings. If you have 20MB cells, you need to be sure > > > > to increase the flush size beyond 64MB and the split size > > > > beyond 256MB. You also need enough memory to support > > > > all this large object allocation. > > > > > > > > And of course, test test test. That's the easiest way > > > > to see if what you want to do will work :) +
Sean Bigdatafun 2010-10-15, 00:23
-
Re: Limits on HBaseRyan Rawson 2010-10-15, 01:41
If you have a single row that approaches then exceeds the size of a
region, eventually you will end up having that row as a single region, with the region encompassing only that one region. The reason for HBase and bigtable is that the overhead that HDFS has... every file in HDFS uses a size of RAM that is not dependent on the size of the file. Meaning the more files you have, that are small, you use more and more RAM and run out of namenode scalability. So HBase exists to store smaller values. There is some overhead. Thus once you start putting in larger values, you might as well avoid the overhead and go straight to/from HDFS. -ryan On Thu, Oct 14, 2010 at 5:23 PM, Sean Bigdatafun <[EMAIL PROTECTED]> wrote: > Let me ask this question from another angle: > > The first question is --- > if I have millions of column in a column family in the same row, such that > the sum of the key-value pairs exceeds 256MB, what will happen? > > example: > I have a column with key of 256bytes, and the value of 2K, then let's assume > (256 + timestampe size + 2056) ~=2.5k, > then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns in > this column family at this row. > > Anyone has comments on the math I gave above? > > > The second question is -- > By the way, if I do not turn on the LZO, is my data also compressed (by the > system)? -- if so, then the above number will increase a couple of times, > but still there exists a number for the limit of how many columns I can put > in a row. > > The third question is -- > If I do turn on LZO, does that mean the value get compressed first, and then > the HBase mechanism further compress the key-value pair? > > Thanks, > Sean > > > On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > >> You can go way beyond the max region split / split size. HBase will never >> split the region once it is a single row, even if beyond the split size. >> >> Also, if you're using large values, you should have region sizes much >> larger than the default. It's common to run with 1-2GB regions in many >> cases. >> >> What you may have seen are recommendations that if your cell values are >> approaching the default block size on HDFS (64MB), you should consider >> putting the data directly into HDFS rather than HBase. >> >> JG >> >> > -----Original Message----- >> > From: William Kang [mailto:[EMAIL PROTECTED]] >> > Sent: Tuesday, September 07, 2010 7:36 PM >> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] >> > Subject: Re: Limits on HBase >> > >> > Hi, >> > Thanks for your reply. How about the row size? I read that a row should >> > not >> > be larger than the hdfs file on region server which is 256M in default. >> > Is >> > it right? Many thanks. >> > >> > >> > William >> > >> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> >> > wrote: >> > >> > > In addition to what Jon said please be aware that if compression is >> > > specified in the table schema, it happens at the store file level -- >> > > compression happens after write I/O, before read I/O, so if you >> > transmit a >> > > 100MB object that compresses to 30MB, the performance impact is that >> > of >> > > 100MB, not 30MB. >> > > >> > > I also try not to go above 50MB as largest cell size, for the same >> > reason. >> > > I have tried storing objects larger than 100MB but this can cause out >> > of >> > > memory issues on busy regionservers no matter the size of the heap. >> > When/if >> > > HBase RPC can send large objects in smaller chunks, this will be less >> > of an >> > > issue. >> > > >> > > Best regards, >> > > >> > > - Andy >> > > >> > > Why is this email five sentences or less? >> > > http://five.sentenc.es/ >> > > >> > > >> > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> > > >> > > > From: Jonathan Gray <[EMAIL PROTECTED]> >> > > > Subject: RE: Limits on HBase >> > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> > > > Date: Monday, September 6, 2010, 4:10 PM +
Ryan Rawson 2010-10-15, 01:41
-
Re: Limits on HBaseSean Bigdatafun 2010-10-15, 09:20
On Thu, Oct 14, 2010 at 6:41 PM, Ryan Rawson <[EMAIL PROTECTED]> wrote:
> If you have a single row that approaches then exceeds the size of a > region, eventually you will end up having that row as a single region, > with the region encompassing only that one region. > > The reason for HBase and bigtable is that the overhead that HDFS > has... every file in HDFS uses a size of RAM that is not dependent on > the size of the file. Meaning the more files you have, that are > small, you use more and more RAM and run out of namenode scalability. > So HBase exists to store smaller values. There is some overhead. Thus > once you start putting in larger values, you might as well avoid the > overhead and go straight to/from HDFS. While, for the scenario that I listed above: millions of small key-value pairs that end up with exceed 256MB, storing these key-value pairs directly into a file in HDFS would not be an option. If we do so, we end up scan throught the whole file; and if we store them into HBase, we are going to leverage the information of the index. > > -ryan > > > On Thu, Oct 14, 2010 at 5:23 PM, Sean Bigdatafun > <[EMAIL PROTECTED]> wrote: > > Let me ask this question from another angle: > > > > The first question is --- > > if I have millions of column in a column family in the same row, such > that > > the sum of the key-value pairs exceeds 256MB, what will happen? > > > > example: > > I have a column with key of 256bytes, and the value of 2K, then let's > assume > > (256 + timestampe size + 2056) ~=2.5k, > > then I understand I can at most story 256 * 1024 / 2.5 = 104,875 columns > in > > this column family at this row. > > > > Anyone has comments on the math I gave above? > > > > > > The second question is -- > > By the way, if I do not turn on the LZO, is my data also compressed (by > the > > system)? -- if so, then the above number will increase a couple of times, > > but still there exists a number for the limit of how many columns I can > put > > in a row. > > > > The third question is -- > > If I do turn on LZO, does that mean the value get compressed first, and > then > > the HBase mechanism further compress the key-value pair? > > > > Thanks, > > Sean > > > > > > On Tue, Sep 7, 2010 at 8:30 PM, Jonathan Gray <[EMAIL PROTECTED]> > wrote: > > > >> You can go way beyond the max region split / split size. HBase will > never > >> split the region once it is a single row, even if beyond the split size. > >> > >> Also, if you're using large values, you should have region sizes much > >> larger than the default. It's common to run with 1-2GB regions in many > >> cases. > >> > >> What you may have seen are recommendations that if your cell values are > >> approaching the default block size on HDFS (64MB), you should consider > >> putting the data directly into HDFS rather than HBase. > >> > >> JG > >> > >> > -----Original Message----- > >> > From: William Kang [mailto:[EMAIL PROTECTED]] > >> > Sent: Tuesday, September 07, 2010 7:36 PM > >> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > >> > Subject: Re: Limits on HBase > >> > > >> > Hi, > >> > Thanks for your reply. How about the row size? I read that a row > should > >> > not > >> > be larger than the hdfs file on region server which is 256M in > default. > >> > Is > >> > it right? Many thanks. > >> > > >> > > >> > William > >> > > >> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> > >> > wrote: > >> > > >> > > In addition to what Jon said please be aware that if compression is > >> > > specified in the table schema, it happens at the store file level -- > >> > > compression happens after write I/O, before read I/O, so if you > >> > transmit a > >> > > 100MB object that compresses to 30MB, the performance impact is that > >> > of > >> > > 100MB, not 30MB. > >> > > > >> > > I also try not to go above 50MB as largest cell size, for the same > >> > reason. > >> > > I have tried storing objects larger than 100MB but this can cause +
Sean Bigdatafun 2010-10-15, 09:20
-
Re: Limits on HBaseWilliam Kang 2010-09-08, 04:07
Hi,
What's the performance looks like if we put large cell in HDFS vs local file system? Random access to HDFS would be slow, right? William On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > You can go way beyond the max region split / split size. HBase will never > split the region once it is a single row, even if beyond the split size. > > Also, if you're using large values, you should have region sizes much > larger than the default. It's common to run with 1-2GB regions in many > cases. > > What you may have seen are recommendations that if your cell values are > approaching the default block size on HDFS (64MB), you should consider > putting the data directly into HDFS rather than HBase. > > JG > > > -----Original Message----- > > From: William Kang [mailto:[EMAIL PROTECTED]] > > Sent: Tuesday, September 07, 2010 7:36 PM > > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] > > Subject: Re: Limits on HBase > > > > Hi, > > Thanks for your reply. How about the row size? I read that a row should > > not > > be larger than the hdfs file on region server which is 256M in default. > > Is > > it right? Many thanks. > > > > > > William > > > > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> > > wrote: > > > > > In addition to what Jon said please be aware that if compression is > > > specified in the table schema, it happens at the store file level -- > > > compression happens after write I/O, before read I/O, so if you > > transmit a > > > 100MB object that compresses to 30MB, the performance impact is that > > of > > > 100MB, not 30MB. > > > > > > I also try not to go above 50MB as largest cell size, for the same > > reason. > > > I have tried storing objects larger than 100MB but this can cause out > > of > > > memory issues on busy regionservers no matter the size of the heap. > > When/if > > > HBase RPC can send large objects in smaller chunks, this will be less > > of an > > > issue. > > > > > > Best regards, > > > > > > - Andy > > > > > > Why is this email five sentences or less? > > > http://five.sentenc.es/ > > > > > > > > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: > > > > > > > From: Jonathan Gray <[EMAIL PROTECTED]> > > > > Subject: RE: Limits on HBase > > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > > > > Date: Monday, September 6, 2010, 4:10 PM > > > > I'm not sure what you mean by > > > > "optimized cell size" or whether you're just asking about > > > > practical limits? > > > > > > > > HBase is generally used with cells in the range of tens of > > > > bytes to hundreds of kilobytes. However, I have used > > > > it with cells that are several megabytes, up to about > > > > 50MB. Up at that level, I have seen some weird > > > > performance issues. > > > > > > > > The most important thing is to be sure to tweak all of your > > > > settings. If you have 20MB cells, you need to be sure > > > > to increase the flush size beyond 64MB and the split size > > > > beyond 256MB. You also need enough memory to support > > > > all this large object allocation. > > > > > > > > And of course, test test test. That's the easiest way > > > > to see if what you want to do will work :) > > > > > > > > When you run into problems, e-mail the list. > > > > > > > > As far as row size is concerned, the only issue is that a > > > > row can never span multiple regions so a given row can only > > > > be in one region and thus be hosted on one server at a > > > > time. > > > > > > > > JG > > > > > > > > > -----Original Message----- > > > > > From: William Kang [mailto:[EMAIL PROTECTED]] > > > > > Sent: Monday, September 06, 2010 1:57 PM > > > > > To: hbase-user > > > > > Subject: Limits on HBase > > > > > > > > > > Hi folks, > > > > > I know this question may have been asked many times, > > > > but I am wondering > > > > > if > > > > > there is any update on the optimized cell size (in > > > > megabytes) and row > > > > > size > > > > > (in megabytes)? Many thanks. +
William Kang 2010-09-08, 04:07
-
Re: Limits on HBaseRyan Rawson 2010-09-08, 04:36
There are 2 definitions of random access:
1) within a file (hdfs can be less than ideal) 2) randomly getting an entire file (not usually considered random gets) for the latter, streaming an entire file from HDFS is actually pretty good. You can see performances of substantial percentages (think 80%+) of the raw disk perf. I benched hdfs and got 90MB/sec last year some time just writing raw files. -ryan On Tue, Sep 7, 2010 at 9:07 PM, William Kang <[EMAIL PROTECTED]> wrote: > Hi, > What's the performance looks like if we put large cell in HDFS vs local file > system? Random access to HDFS would be slow, right? > > > William > > On Tue, Sep 7, 2010 at 11:30 PM, Jonathan Gray <[EMAIL PROTECTED]> wrote: > >> You can go way beyond the max region split / split size. HBase will never >> split the region once it is a single row, even if beyond the split size. >> >> Also, if you're using large values, you should have region sizes much >> larger than the default. It's common to run with 1-2GB regions in many >> cases. >> >> What you may have seen are recommendations that if your cell values are >> approaching the default block size on HDFS (64MB), you should consider >> putting the data directly into HDFS rather than HBase. >> >> JG >> >> > -----Original Message----- >> > From: William Kang [mailto:[EMAIL PROTECTED]] >> > Sent: Tuesday, September 07, 2010 7:36 PM >> > To: [EMAIL PROTECTED]; [EMAIL PROTECTED] >> > Subject: Re: Limits on HBase >> > >> > Hi, >> > Thanks for your reply. How about the row size? I read that a row should >> > not >> > be larger than the hdfs file on region server which is 256M in default. >> > Is >> > it right? Many thanks. >> > >> > >> > William >> > >> > On Tue, Sep 7, 2010 at 2:22 PM, Andrew Purtell <[EMAIL PROTECTED]> >> > wrote: >> > >> > > In addition to what Jon said please be aware that if compression is >> > > specified in the table schema, it happens at the store file level -- >> > > compression happens after write I/O, before read I/O, so if you >> > transmit a >> > > 100MB object that compresses to 30MB, the performance impact is that >> > of >> > > 100MB, not 30MB. >> > > >> > > I also try not to go above 50MB as largest cell size, for the same >> > reason. >> > > I have tried storing objects larger than 100MB but this can cause out >> > of >> > > memory issues on busy regionservers no matter the size of the heap. >> > When/if >> > > HBase RPC can send large objects in smaller chunks, this will be less >> > of an >> > > issue. >> > > >> > > Best regards, >> > > >> > > - Andy >> > > >> > > Why is this email five sentences or less? >> > > http://five.sentenc.es/ >> > > >> > > >> > > --- On Mon, 9/6/10, Jonathan Gray <[EMAIL PROTECTED]> wrote: >> > > >> > > > From: Jonathan Gray <[EMAIL PROTECTED]> >> > > > Subject: RE: Limits on HBase >> > > > To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> >> > > > Date: Monday, September 6, 2010, 4:10 PM >> > > > I'm not sure what you mean by >> > > > "optimized cell size" or whether you're just asking about >> > > > practical limits? >> > > > >> > > > HBase is generally used with cells in the range of tens of >> > > > bytes to hundreds of kilobytes. However, I have used >> > > > it with cells that are several megabytes, up to about >> > > > 50MB. Up at that level, I have seen some weird >> > > > performance issues. >> > > > >> > > > The most important thing is to be sure to tweak all of your >> > > > settings. If you have 20MB cells, you need to be sure >> > > > to increase the flush size beyond 64MB and the split size >> > > > beyond 256MB. You also need enough memory to support >> > > > all this large object allocation. >> > > > >> > > > And of course, test test test. That's the easiest way >> > > > to see if what you want to do will work :) >> > > > >> > > > When you run into problems, e-mail the list. >> > > > >> > > > As far as row size is concerned, the only issue is that a >> > > > row can never span multiple regions so a given row can only +
Ryan Rawson 2010-09-08, 04:36
|