|
kavishahuja
2013-01-05, 10:11
谢良
2013-01-06, 03:58
Mohit Anchlia
2013-01-06, 05:45
谢良
2013-01-06, 06:14
Damien Hardy
2013-01-06, 09:35
Yusup Ashrap
2013-01-06, 11:58
Andrew Purtell
2013-01-06, 20:12
Asaf Mesika
2013-01-06, 20:28
Amandeep Khurana
2013-01-06, 20:33
Andrew Purtell
2013-01-06, 20:49
Andrew Purtell
2013-01-06, 20:52
Mohit Anchlia
2013-01-06, 21:09
Michael Segel
2013-01-11, 15:00
Mohammad Tariq
2013-01-11, 15:27
Mohit Anchlia
2013-01-11, 17:40
Jack Levin
2013-01-11, 17:47
Jack Levin
2013-01-11, 17:51
Marcos Ortiz
2013-01-11, 18:01
Mohit Anchlia
2013-01-13, 15:47
Jack Levin
2013-01-13, 16:17
Varun Sharma
2013-01-17, 23:29
Jack Levin
2013-01-20, 19:49
Varun Sharma
2013-01-22, 01:10
Varun Sharma
2013-01-22, 01:12
Jack Levin
2013-01-24, 04:53
S Ahmed
2013-01-24, 22:13
Jack Levin
2013-01-25, 07:41
S Ahmed
2013-01-27, 02:00
Jack Levin
2013-01-27, 02:56
yiyu jia
2013-01-27, 15:37
Jack Levin
2013-01-27, 16:56
yiyu jia
2013-01-27, 21:58
Jack Levin
2013-01-28, 04:06
Jack Levin
2013-01-28, 04:16
Adrien Mogenet
2013-01-28, 10:01
Jack Levin
2013-01-28, 18:08
Andrew Purtell
2013-01-28, 18:58
yiyu jia
2013-01-28, 20:23
Andrew Purtell
2013-01-28, 21:13
yiyu jia
2013-01-28, 21:44
Andrew Purtell
2013-01-28, 21:49
|
-
Storing images in Hbasekavishahuja 2013-01-05, 10:11
*Hello EVERYBODY
first of all, a happy new year to everyone !! I need a small help regarding pushing images into apache HBase(DB)...i know its about converting objects into bytes and then saving those bytes into hbase rows. But still i cant do it. Kindly help !! * Regards, Kavish -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html Sent from the HBase User mailing list archive at Nabble.com.
-
答复: Storing images in Hbase谢良 2013-01-06, 03:58
Just out of curiousity, why not considering a blob storage system ?
Best Regards, Liang ________________________________________ 发件人: kavishahuja [[EMAIL PROTECTED]] 发送时间: 2013年1月5日 18:11 收件人: [EMAIL PROTECTED] 主题: Storing images in Hbase *Hello EVERYBODY first of all, a happy new year to everyone !! I need a small help regarding pushing images into apache HBase(DB)...i know its about converting objects into bytes and then saving those bytes into hbase rows. But still i cant do it. Kindly help !! * Regards, Kavish -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html Sent from the HBase User mailing list archive at Nabble.com.
-
Re: 答复: Storing images in HbaseMohit Anchlia 2013-01-06, 05:45
IMHO Use dfs unread for blobs and use Hbase for meta data
Sent from my iPhone On Jan 5, 2013, at 7:58 PM, 谢良 <[EMAIL PROTECTED]> wrote: > Just out of curiousity, why not considering a blob storage system ? > > Best Regards, > Liang > ________________________________________ > 发件人: kavishahuja [[EMAIL PROTECTED]] > 发送时间: 2013年1月5日 18:11 > 收件人: [EMAIL PROTECTED] > 主题: Storing images in Hbase > > *Hello EVERYBODY > first of all, a happy new year to everyone !! > I need a small help regarding pushing images into apache HBase(DB)...i know > its about converting objects into bytes and then saving those bytes into > hbase rows. But still i cant do it. > Kindly help !! * > > Regards, > Kavish > > > > -- > View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html > Sent from the HBase User mailing list archive at Nabble.com.
-
答复: 答复: Storing images in Hbase谢良 2013-01-06, 06:14
HBase is not the best choice for blob(photo/image/...) storage(file sizes are ofter smaller than tens of MB).
Here are several blob storage systems : google blob storage : https://developers.google.com/appengine/docs/java/blobstore/overview facebook haystack : http://www.facebook.com/note.php?note_id=76191543919 twitter : http://engineering.twitter.com/2012/12/blobstore-twitters-in-house-photo.html taobao tfs : http://code.taobao.org/p/tfs/src/trunk/src/ (https://github.com/taobao/tfs) Thanks, ________________________________________ 发件人: Mohit Anchlia [[EMAIL PROTECTED]] 发送时间: 2013年1月6日 13:45 收件人: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] 主题: Re: 答复: Storing images in Hbase IMHO Use dfs unread for blobs and use Hbase for meta data Sent from my iPhone On Jan 5, 2013, at 7:58 PM, 谢良 <[EMAIL PROTECTED]> wrote: > Just out of curiousity, why not considering a blob storage system ? > > Best Regards, > Liang > ________________________________________ > 发件人: kavishahuja [[EMAIL PROTECTED]] > 发送时间: 2013年1月5日 18:11 > 收件人: [EMAIL PROTECTED] > 主题: Storing images in Hbase > > *Hello EVERYBODY > first of all, a happy new year to everyone !! > I need a small help regarding pushing images into apache HBase(DB)...i know > its about converting objects into bytes and then saving those bytes into > hbase rows. But still i cant do it. > Kindly help !! * > > Regards, > Kavish > > > > -- > View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html > Sent from the HBase User mailing list archive at Nabble.com.
-
Re: Storing images in HbaseDamien Hardy 2013-01-06, 09:35
Hi there,
Thank you, and happy new year. I had the same problematic and wrote a python module⁰ for thumbor¹ I use the Thrift interface for HBase to store image blobs. As allready said you have to keep images blob quite small (for latency problematic in web you have to keep them small too) ~100ko, so HBase should keep good performances. BTW Stumbleupon store all its assets in HBase : http://bb10.com/java-hadoop-hbase-user/2012-03/msg00054.html [0] https://github.com/dhardy92/thumbor_hbase [1] https://github.com/globocom/thumbor/wiki Cheers, -- Damien Le 6 janv. 2013 04:46, "kavishahuja" <[EMAIL PROTECTED]> a écrit : > *Hello EVERYBODY > first of all, a happy new year to everyone !! > I need a small help regarding pushing images into apache HBase(DB)...i know > its about converting objects into bytes and then saving those bytes into > hbase rows. But still i cant do it. > Kindly help !! * > > Regards, > Kavish > >
-
Re: Storing images in HbaseYusup Ashrap 2013-01-06, 11:58
there are a lot great discussions on Quora on this topic.
http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment
-
Re: Storing images in HbaseAndrew Purtell 2013-01-06, 20:12
Also YFrog / ImageShack serves all of its assets out of HBase too, so for
reasonably sized images some are having success. See http://www.slideshare.net/jacque74/hug-hbase-presentation On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <[EMAIL PROTECTED]> wrote: > there are a lot great discussions on Quora on this topic. > > http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS > http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images > > http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Storing images in HbaseAsaf Mesika 2013-01-06, 20:28
What's the penalty performance wise of saving a very large value in a
KeyValue in hbase? Splits, scans, etc. Sent from my iPad On 6 בינו 2013, at 22:12, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Also YFrog / ImageShack serves all of its assets out of HBase too, so for > reasonably sized images some are having success. See > http://www.slideshare.net/jacque74/hug-hbase-presentation > > > On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <[EMAIL PROTECTED]> wrote: > >> there are a lot great discussions on Quora on this topic. >> >> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS >> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images >> >> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment >> > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White)
-
Re: Storing images in HbaseAmandeep Khurana 2013-01-06, 20:33
To add to Andy's point - storing images in HBase is fine as long as
the size of each image isn't huge. A couple for MBs per row in HBase do just fine. But once you start getting into 10s of MBs, there are more optimal solutions you can explore and HBase might not be the best bet. Amandeep On Jan 6, 2013, at 12:12 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Also YFrog / ImageShack serves all of its assets out of HBase too, so for > reasonably sized images some are having success. See > http://www.slideshare.net/jacque74/hug-hbase-presentation > > > On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <[EMAIL PROTECTED]> wrote: > >> there are a lot great discussions on Quora on this topic. >> >> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS >> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images >> >> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White)
-
Re: Storing images in HbaseAndrew Purtell 2013-01-06, 20:49
What do you mean by "very large"?
One possible source of performance concern is HBase RPC does not do positioned/chunked/partial reads, so both on the RegionServer and client the entirety of value data will be in the heap. A lot of really large objects brought in this way under high concurrency can cause excessive GC from fragmentation or OOME conditions if the heap isn't adequately sized. The recommendation of ~10 MB max is to mitigate these effects. There is nothing scientific about that number though, it's a rule of thumb, I've built HBase applications with a max value size of 100 MB and it performed adequately. (Larger objects were split into 100 MB chunks and keyed as $rowkey$chunk where $chunk was an integer serialized with Bytes.toInt()). Another is a consequence of the fact a row cannot be split. This means that if the data in a single row grows significantly larger than the region split threshold, you will have this one region sized differently from the others, and this can lead to unexpected behavior. Consider if the split threshold is 2 GB but your one row contains 10 GB as really large value. This is undesirable because HBase expects housekeeping on a given region to be more or less equal to others: compaction, etc. >From the application POV, if you have a few really big value size outliers, then these could be like land mines if the app is short scanning over table data. Gets or Scans including such values will have widely varying latency from others. But this would be an application design problem. On Sun, Jan 6, 2013 at 12:28 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote: > What's the penalty performance wise of saving a very large value in a > KeyValue in hbase? Splits, scans, etc. > > Sent from my iPad > > On 6 בינו 2013, at 22:12, Andrew Purtell <[EMAIL PROTECTED]> wrote: > > > Also YFrog / ImageShack serves all of its assets out of HBase too, so for > > reasonably sized images some are having success. See > > http://www.slideshare.net/jacque74/hug-hbase-presentation > > > > > > On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <[EMAIL PROTECTED]> wrote: > > > >> there are a lot great discussions on Quora on this topic. > >> > >> > http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS > >> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images > >> > >> > http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment > >> > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Storing images in HbaseAndrew Purtell 2013-01-06, 20:52
I meant this to say "a few really large values"
On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > Consider if the split threshold is 2 GB but your one row contains 10 GB as > really large value. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Storing images in HbaseMohit Anchlia 2013-01-06, 21:09
I have done extensive testing and have found that blobs don't belong in the
databases but are rather best left out on the file system. Andrew outlined issues that you'll face and not to mention IO issues when compaction occurs over large files. On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > I meant this to say "a few really large values" > > On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <[EMAIL PROTECTED]> > wrote: > > > Consider if the split threshold is 2 GB but your one row contains 10 GB > as > > really large value. > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
-
Re: Storing images in HbaseMichael Segel 2013-01-11, 15:00
That's a viable option.
HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file. It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project.... On Jan 10, 2013, at 10:56 PM, shashwat shriparv <[EMAIL PROTECTED]> wrote: > Hi Kavish, > > i have a better idea for you copy your image files to a single file on > hdfs, and if new image comes append it to the existing image, and keep and > update the metadata and the offset to the HBase. Because if you put bigger > image in hbase it wil lead to some issue. > > > > ∞ > Shashwat Shriparv > > > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > >> Interesting. That's close to a PB if my math is correct. >> Is there a write up about this somewhere? Something that we could link >> from the HBase homepage? >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Jack Levin <[EMAIL PROTECTED]> >> To: [EMAIL PROTECTED] >> Cc: Andrew Purtell <[EMAIL PROTECTED]> >> Sent: Thursday, January 10, 2013 9:24 AM >> Subject: Re: Storing images in Hbase >> >> We stored about 1 billion images into hbase with file size up to 10MB. >> Its been running for close to 2 years without issues and serves >> delivery of images for Yfrog and ImageShack. If you have any >> questions about the setup, I would be glad to answer them. >> >> -Jack >> >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <[EMAIL PROTECTED]> >> wrote: >>> I have done extensive testing and have found that blobs don't belong in >> the >>> databases but are rather best left out on the file system. Andrew >> outlined >>> issues that you'll face and not to mention IO issues when compaction >> occurs >>> over large files. >>> >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <[EMAIL PROTECTED]> >> wrote: >>> >>>> I meant this to say "a few really large values" >>>> >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <[EMAIL PROTECTED]> >>>> wrote: >>>> >>>>> Consider if the split threshold is 2 GB but your one row contains 10 >> GB >>>> as >>>>> really large value. >>>> >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> - Andy >>>> >>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>>> (via Tom White) >>>> >> >>
-
Re: Storing images in HbaseMohammad Tariq 2013-01-11, 15:27
IMHO, if the image files are not too huge, Hbase can efficiently serve the
purpose. You can store some additional info along with the file depending upon your search criteria to make the search faster. Say if you want to fetch images by the type, you can store images in one column and its extension in another column(jpg, tiff etc). BTW, what exactly is the problem which you are facing. You have written "But I still cant do it"? Warm Regards, Tariq https://mtariq.jux.com/ On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <[EMAIL PROTECTED]>wrote: > That's a viable option. > HDFS reads are faster than HBase, but it would require first hitting the > index in HBase which points to the file and then fetching the file. > It could be faster... we found storing binary data in a sequence file and > indexed on HBase to be faster than HBase, however, YMMV and HBase has been > improved since we did that project.... > > > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <[EMAIL PROTECTED]> > wrote: > > > Hi Kavish, > > > > i have a better idea for you copy your image files to a single file on > > hdfs, and if new image comes append it to the existing image, and keep > and > > update the metadata and the offset to the HBase. Because if you put > bigger > > image in hbase it wil lead to some issue. > > > > > > > > ∞ > > Shashwat Shriparv > > > > > > > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]> wrote: > > > >> Interesting. That's close to a PB if my math is correct. > >> Is there a write up about this somewhere? Something that we could link > >> from the HBase homepage? > >> > >> -- Lars > >> > >> > >> ----- Original Message ----- > >> From: Jack Levin <[EMAIL PROTECTED]> > >> To: [EMAIL PROTECTED] > >> Cc: Andrew Purtell <[EMAIL PROTECTED]> > >> Sent: Thursday, January 10, 2013 9:24 AM > >> Subject: Re: Storing images in Hbase > >> > >> We stored about 1 billion images into hbase with file size up to 10MB. > >> Its been running for close to 2 years without issues and serves > >> delivery of images for Yfrog and ImageShack. If you have any > >> questions about the setup, I would be glad to answer them. > >> > >> -Jack > >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <[EMAIL PROTECTED]> > >> wrote: > >>> I have done extensive testing and have found that blobs don't belong in > >> the > >>> databases but are rather best left out on the file system. Andrew > >> outlined > >>> issues that you'll face and not to mention IO issues when compaction > >> occurs > >>> over large files. > >>> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <[EMAIL PROTECTED]> > >> wrote: > >>> > >>>> I meant this to say "a few really large values" > >>>> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <[EMAIL PROTECTED]> > >>>> wrote: > >>>> > >>>>> Consider if the split threshold is 2 GB but your one row contains 10 > >> GB > >>>> as > >>>>> really large value. > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Best regards, > >>>> > >>>> - Andy > >>>> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > >>>> (via Tom White) > >>>> > >> > >> > >
-
Re: Storing images in HbaseMohit Anchlia 2013-01-11, 17:40
I think it really depends on volume of the traffic, data distribution per
region, how and when files compaction occurs, number of nodes in the cluster. In my experience when it comes to blob data where you are serving 10s of thousand+ requests/sec writes and reads then it's very difficult to manage HBase without very hard operations and maintenance in play. Jack earlier mentioned they have 1 billion images, It would be interesting to know what they see in terms of compaction, no of requests per sec. I'd be surprised that in high volume site it can be done without any Caching layer on the top to alleviate IO spikes that occurs because of GC and compactions. On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > IMHO, if the image files are not too huge, Hbase can efficiently serve the > purpose. You can store some additional info along with the file depending > upon your search criteria to make the search faster. Say if you want to > fetch images by the type, you can store images in one column and its > extension in another column(jpg, tiff etc). > > BTW, what exactly is the problem which you are facing. You have written > "But I still cant do it"? > > Warm Regards, > Tariq > https://mtariq.jux.com/ > > > On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <[EMAIL PROTECTED] > >wrote: > > > That's a viable option. > > HDFS reads are faster than HBase, but it would require first hitting the > > index in HBase which points to the file and then fetching the file. > > It could be faster... we found storing binary data in a sequence file and > > indexed on HBase to be faster than HBase, however, YMMV and HBase has > been > > improved since we did that project.... > > > > > > On Jan 10, 2013, at 10:56 PM, shashwat shriparv < > [EMAIL PROTECTED]> > > wrote: > > > > > Hi Kavish, > > > > > > i have a better idea for you copy your image files to a single file on > > > hdfs, and if new image comes append it to the existing image, and keep > > and > > > update the metadata and the offset to the HBase. Because if you put > > bigger > > > image in hbase it wil lead to some issue. > > > > > > > > > > > > ∞ > > > Shashwat Shriparv > > > > > > > > > > > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]> > wrote: > > > > > >> Interesting. That's close to a PB if my math is correct. > > >> Is there a write up about this somewhere? Something that we could link > > >> from the HBase homepage? > > >> > > >> -- Lars > > >> > > >> > > >> ----- Original Message ----- > > >> From: Jack Levin <[EMAIL PROTECTED]> > > >> To: [EMAIL PROTECTED] > > >> Cc: Andrew Purtell <[EMAIL PROTECTED]> > > >> Sent: Thursday, January 10, 2013 9:24 AM > > >> Subject: Re: Storing images in Hbase > > >> > > >> We stored about 1 billion images into hbase with file size up to 10MB. > > >> Its been running for close to 2 years without issues and serves > > >> delivery of images for Yfrog and ImageShack. If you have any > > >> questions about the setup, I would be glad to answer them. > > >> > > >> -Jack > > >> > > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <[EMAIL PROTECTED] > > > > >> wrote: > > >>> I have done extensive testing and have found that blobs don't belong > in > > >> the > > >>> databases but are rather best left out on the file system. Andrew > > >> outlined > > >>> issues that you'll face and not to mention IO issues when compaction > > >> occurs > > >>> over large files. > > >>> > > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <[EMAIL PROTECTED] > > > > >> wrote: > > >>> > > >>>> I meant this to say "a few really large values" > > >>>> > > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell < > [EMAIL PROTECTED]> > > >>>> wrote: > > >>>> > > >>>>> Consider if the split threshold is 2 GB but your one row contains > 10 > > >> GB > > >>>> as > > >>>>> really large value. > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Best regards, > > >>>> > > >>>> - Andy > > >>>> > > >>>
-
Re: Storing images in HbaseJack Levin 2013-01-11, 17:47
We buffer all accesses to HBASE with Varnish SSD based caching layer.
So the impact for reads is negligible. We have 70 node cluster, 8 GB of RAM per node, relatively weak nodes (intel core 2 duo), with 10-12TB per server of disks. Inserting 600,000 images per day. We have relatively little of compaction activity as we made our write cache much larger than read cache - so we don't experience region file fragmentation as much. -Jack On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: > I think it really depends on volume of the traffic, data distribution per > region, how and when files compaction occurs, number of nodes in the > cluster. In my experience when it comes to blob data where you are serving > 10s of thousand+ requests/sec writes and reads then it's very difficult to > manage HBase without very hard operations and maintenance in play. Jack > earlier mentioned they have 1 billion images, It would be interesting to > know what they see in terms of compaction, no of requests per sec. I'd be > surprised that in high volume site it can be done without any Caching layer > on the top to alleviate IO spikes that occurs because of GC and compactions. > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: > >> IMHO, if the image files are not too huge, Hbase can efficiently serve the >> purpose. You can store some additional info along with the file depending >> upon your search criteria to make the search faster. Say if you want to >> fetch images by the type, you can store images in one column and its >> extension in another column(jpg, tiff etc). >> >> BTW, what exactly is the problem which you are facing. You have written >> "But I still cant do it"? >> >> Warm Regards, >> Tariq >> https://mtariq.jux.com/ >> >> >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <[EMAIL PROTECTED] >> >wrote: >> >> > That's a viable option. >> > HDFS reads are faster than HBase, but it would require first hitting the >> > index in HBase which points to the file and then fetching the file. >> > It could be faster... we found storing binary data in a sequence file and >> > indexed on HBase to be faster than HBase, however, YMMV and HBase has >> been >> > improved since we did that project.... >> > >> > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv < >> [EMAIL PROTECTED]> >> > wrote: >> > >> > > Hi Kavish, >> > > >> > > i have a better idea for you copy your image files to a single file on >> > > hdfs, and if new image comes append it to the existing image, and keep >> > and >> > > update the metadata and the offset to the HBase. Because if you put >> > bigger >> > > image in hbase it wil lead to some issue. >> > > >> > > >> > > >> > > ∞ >> > > Shashwat Shriparv >> > > >> > > >> > > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]> >> wrote: >> > > >> > >> Interesting. That's close to a PB if my math is correct. >> > >> Is there a write up about this somewhere? Something that we could link >> > >> from the HBase homepage? >> > >> >> > >> -- Lars >> > >> >> > >> >> > >> ----- Original Message ----- >> > >> From: Jack Levin <[EMAIL PROTECTED]> >> > >> To: [EMAIL PROTECTED] >> > >> Cc: Andrew Purtell <[EMAIL PROTECTED]> >> > >> Sent: Thursday, January 10, 2013 9:24 AM >> > >> Subject: Re: Storing images in Hbase >> > >> >> > >> We stored about 1 billion images into hbase with file size up to 10MB. >> > >> Its been running for close to 2 years without issues and serves >> > >> delivery of images for Yfrog and ImageShack. If you have any >> > >> questions about the setup, I would be glad to answer them. >> > >> >> > >> -Jack >> > >> >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <[EMAIL PROTECTED] >> > >> > >> wrote: >> > >>> I have done extensive testing and have found that blobs don't belong >> in >> > >> the >> > >>> databases but are rather best left out on the file system. Andrew >> > >> outlined >> > >>> issues that you'll face and not to mention IO issues when compaction
-
Re: Storing images in HbaseJack Levin 2013-01-11, 17:51
http://img338.imageshack.us/img338/6831/screenshot20130111at949.png
this shows how often we flush, and how large are the region files. We do have bloomfilters turn up, that we don't incur extra seeks across multiple RS files. -Jack On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > We buffer all accesses to HBASE with Varnish SSD based caching layer. > So the impact for reads is negligible. We have 70 node cluster, 8 GB > of RAM per node, relatively weak nodes (intel core 2 duo), with > 10-12TB per server of disks. Inserting 600,000 images per day. We > have relatively little of compaction activity as we made our write > cache much larger than read cache - so we don't experience region file > fragmentation as much. > > -Jack > > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >> I think it really depends on volume of the traffic, data distribution per >> region, how and when files compaction occurs, number of nodes in the >> cluster. In my experience when it comes to blob data where you are serving >> 10s of thousand+ requests/sec writes and reads then it's very difficult to >> manage HBase without very hard operations and maintenance in play. Jack >> earlier mentioned they have 1 billion images, It would be interesting to >> know what they see in terms of compaction, no of requests per sec. I'd be >> surprised that in high volume site it can be done without any Caching layer >> on the top to alleviate IO spikes that occurs because of GC and compactions. >> >> On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote: >> >>> IMHO, if the image files are not too huge, Hbase can efficiently serve the >>> purpose. You can store some additional info along with the file depending >>> upon your search criteria to make the search faster. Say if you want to >>> fetch images by the type, you can store images in one column and its >>> extension in another column(jpg, tiff etc). >>> >>> BTW, what exactly is the problem which you are facing. You have written >>> "But I still cant do it"? >>> >>> Warm Regards, >>> Tariq >>> https://mtariq.jux.com/ >>> >>> >>> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <[EMAIL PROTECTED] >>> >wrote: >>> >>> > That's a viable option. >>> > HDFS reads are faster than HBase, but it would require first hitting the >>> > index in HBase which points to the file and then fetching the file. >>> > It could be faster... we found storing binary data in a sequence file and >>> > indexed on HBase to be faster than HBase, however, YMMV and HBase has >>> been >>> > improved since we did that project.... >>> > >>> > >>> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv < >>> [EMAIL PROTECTED]> >>> > wrote: >>> > >>> > > Hi Kavish, >>> > > >>> > > i have a better idea for you copy your image files to a single file on >>> > > hdfs, and if new image comes append it to the existing image, and keep >>> > and >>> > > update the metadata and the offset to the HBase. Because if you put >>> > bigger >>> > > image in hbase it wil lead to some issue. >>> > > >>> > > >>> > > >>> > > ∞ >>> > > Shashwat Shriparv >>> > > >>> > > >>> > > >>> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]> >>> wrote: >>> > > >>> > >> Interesting. That's close to a PB if my math is correct. >>> > >> Is there a write up about this somewhere? Something that we could link >>> > >> from the HBase homepage? >>> > >> >>> > >> -- Lars >>> > >> >>> > >> >>> > >> ----- Original Message ----- >>> > >> From: Jack Levin <[EMAIL PROTECTED]> >>> > >> To: [EMAIL PROTECTED] >>> > >> Cc: Andrew Purtell <[EMAIL PROTECTED]> >>> > >> Sent: Thursday, January 10, 2013 9:24 AM >>> > >> Subject: Re: Storing images in Hbase >>> > >> >>> > >> We stored about 1 billion images into hbase with file size up to 10MB. >>> > >> Its been running for close to 2 years without issues and serves >>> > >> delivery of images for Yfrog and ImageShack. If you have any >>> >
-
Re: Storing images in HbaseMarcos Ortiz 2013-01-11, 18:01
It would be nice a blog post around this.
El 11/01/2013 0:51, lars hofhansl escribi�: > Interesting. That's close to a PB if my math is correct. > Is there a write up about this somewhere? Something that we could link from the HBase homepage? > > -- Lars > > > ----- Original Message ----- > From: Jack Levin <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: Andrew Purtell <[EMAIL PROTECTED]> > Sent: Thursday, January 10, 2013 9:24 AM > Subject: Re: Storing images in Hbase > > We stored about 1 billion images into hbase with file size up to 10MB. > Its been running for close to 2 years without issues and serves > delivery of images for Yfrog and ImageShack. If you have any > questions about the setup, I would be glad to answer them. > > -Jack > > On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <[EMAIL PROTECTED]> wrote: >> I have done extensive testing and have found that blobs don't belong in the >> databases but are rather best left out on the file system. Andrew outlined >> issues that you'll face and not to mention IO issues when compaction occurs >> over large files. >> >> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: >> >>> I meant this to say "a few really large values" >>> >>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <[EMAIL PROTECTED]> >>> wrote: >>> >>>> Consider if the split threshold is 2 GB but your one row contains 10 GB >>> as >>>> really large value. >>> >>> >>> >>> -- >>> Best regards, >>> >>> - Andy >>> >>> Problems worthy of attack prove their worth by hitting back. - Piet Hein >>> (via Tom White) >>> > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
-
Re: Storing images in HbaseMohit Anchlia 2013-01-13, 15:47
Thanks Jack for sharing this information. This definitely makes sense when
using the type of caching layer. You mentioned about increasing write cache, I am assuming you had to increase the following parameters in addition to increase the memstore size: hbase.hregion.max.filesize hbase.hregion.memstore.flush.size On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > We buffer all accesses to HBASE with Varnish SSD based caching layer. > So the impact for reads is negligible. We have 70 node cluster, 8 GB > of RAM per node, relatively weak nodes (intel core 2 duo), with > 10-12TB per server of disks. Inserting 600,000 images per day. We > have relatively little of compaction activity as we made our write > cache much larger than read cache - so we don't experience region file > fragmentation as much. > > -Jack > > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > I think it really depends on volume of the traffic, data distribution per > > region, how and when files compaction occurs, number of nodes in the > > cluster. In my experience when it comes to blob data where you are > serving > > 10s of thousand+ requests/sec writes and reads then it's very difficult > to > > manage HBase without very hard operations and maintenance in play. Jack > > earlier mentioned they have 1 billion images, It would be interesting to > > know what they see in terms of compaction, no of requests per sec. I'd be > > surprised that in high volume site it can be done without any Caching > layer > > on the top to alleviate IO spikes that occurs because of GC and > compactions. > > > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > > > >> IMHO, if the image files are not too huge, Hbase can efficiently serve > the > >> purpose. You can store some additional info along with the file > depending > >> upon your search criteria to make the search faster. Say if you want to > >> fetch images by the type, you can store images in one column and its > >> extension in another column(jpg, tiff etc). > >> > >> BTW, what exactly is the problem which you are facing. You have written > >> "But I still cant do it"? > >> > >> Warm Regards, > >> Tariq > >> https://mtariq.jux.com/ > >> > >> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel < > [EMAIL PROTECTED] > >> >wrote: > >> > >> > That's a viable option. > >> > HDFS reads are faster than HBase, but it would require first hitting > the > >> > index in HBase which points to the file and then fetching the file. > >> > It could be faster... we found storing binary data in a sequence file > and > >> > indexed on HBase to be faster than HBase, however, YMMV and HBase has > >> been > >> > improved since we did that project.... > >> > > >> > > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv < > >> [EMAIL PROTECTED]> > >> > wrote: > >> > > >> > > Hi Kavish, > >> > > > >> > > i have a better idea for you copy your image files to a single file > on > >> > > hdfs, and if new image comes append it to the existing image, and > keep > >> > and > >> > > update the metadata and the offset to the HBase. Because if you put > >> > bigger > >> > > image in hbase it wil lead to some issue. > >> > > > >> > > > >> > > > >> > > ∞ > >> > > Shashwat Shriparv > >> > > > >> > > > >> > > > >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]> > >> wrote: > >> > > > >> > >> Interesting. That's close to a PB if my math is correct. > >> > >> Is there a write up about this somewhere? Something that we could > link > >> > >> from the HBase homepage? > >> > >> > >> > >> -- Lars > >> > >> > >> > >> > >> > >> ----- Original Message ----- > >> > >> From: Jack Levin <[EMAIL PROTECTED]> > >> > >> To: [EMAIL PROTECTED] > >> > >> Cc: Andrew Purtell <[EMAIL PROTECTED]> > >> > >> Sent: Thursday, January 10, 2013 9:24 AM > >> > >> Subject: Re: Storing images in Hbase > >> > >> > >> > >> We stored about 1 billion images into hbase with file size up to
-
RE: Storing images in HbaseJack Levin 2013-01-13, 16:17
That's right, Memstore size , not flush size is increased. Filesize is
10G. Overall write cache is 60% of heap and read cache is 20%. Flush size is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that can be promoted. On the way to hbase images are written to a queue, so that we can take Hbase down for maintenance and still do inserts later. ImageShack has ‘perma cache’ servers that allows writes and serving of data even when hbase is down for hours, consider it 4th replica 😉 outside of hadoop Jack *From:* Mohit Anchlia <[EMAIL PROTECTED]> *Sent:* January 13, 2013 7:48 AM *To:* [EMAIL PROTECTED] *Subject:* Re: Storing images in Hbase Thanks Jack for sharing this information. This definitely makes sense when using the type of caching layer. You mentioned about increasing write cache, I am assuming you had to increase the following parameters in addition to increase the memstore size: hbase.hregion.max.filesize hbase.hregion.memstore.flush.size On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > We buffer all accesses to HBASE with Varnish SSD based caching layer. > So the impact for reads is negligible. We have 70 node cluster, 8 GB > of RAM per node, relatively weak nodes (intel core 2 duo), with > 10-12TB per server of disks. Inserting 600,000 images per day. We > have relatively little of compaction activity as we made our write > cache much larger than read cache - so we don't experience region file > fragmentation as much. > > -Jack > > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]> > wrote: > > I think it really depends on volume of the traffic, data distribution per > > region, how and when files compaction occurs, number of nodes in the > > cluster. In my experience when it comes to blob data where you are > serving > > 10s of thousand+ requests/sec writes and reads then it's very difficult > to > > manage HBase without very hard operations and maintenance in play. Jack > > earlier mentioned they have 1 billion images, It would be interesting to > > know what they see in terms of compaction, no of requests per sec. I'd be > > surprised that in high volume site it can be done without any Caching > layer > > on the top to alleviate IO spikes that occurs because of GC and > compactions. > > > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> > wrote: > > > >> IMHO, if the image files are not too huge, Hbase can efficiently serve > the > >> purpose. You can store some additional info along with the file > depending > >> upon your search criteria to make the search faster. Say if you want to > >> fetch images by the type, you can store images in one column and its > >> extension in another column(jpg, tiff etc). > >> > >> BTW, what exactly is the problem which you are facing. You have written > >> "But I still cant do it"? > >> > >> Warm Regards, > >> Tariq > >> https://mtariq.jux.com/ > >> > >> > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel < > [EMAIL PROTECTED] > >> >wrote: > >> > >> > That's a viable option. > >> > HDFS reads are faster than HBase, but it would require first hitting > the > >> > index in HBase which points to the file and then fetching the file. > >> > It could be faster... we found storing binary data in a sequence file > and > >> > indexed on HBase to be faster than HBase, however, YMMV and HBase has > >> been > >> > improved since we did that project.... > >> > > >> > > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv < > >> [EMAIL PROTECTED]> > >> > wrote: > >> > > >> > > Hi Kavish, > >> > > > >> > > i have a better idea for you copy your image files to a single file > on > >> > > hdfs, and if new image comes append it to the existing image, and > keep > >> > and > >> > > update the metadata and the offset to the HBase. Because if you put > >> > bigger > >> > > image in hbase it wil lead to some issue. > >> > > > >> > > > >> > > > >> > > ∞ > >> > > Shashwat Shriparv
-
Re: Storing images in HbaseVarun Sharma 2013-01-17, 23:29
Hey Jack,
Thanks for the useful information. By flush size being 15 %, do you mean the memstore flush size ? 15 % would mean close to 1G, have you seen any issues with flushes taking too long ? Thanks Varun On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > That's right, Memstore size , not flush size is increased. Filesize is > 10G. Overall write cache is 60% of heap and read cache is 20%. Flush size > is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that can > be promoted. On the way to hbase images are written to a queue, so that we > can take Hbase down for maintenance and still do inserts later. ImageShack > has ‘perma cache’ servers that allows writes and serving of data even when > hbase is down for hours, consider it 4th replica 😉 outside of hadoop > > Jack > > *From:* Mohit Anchlia <[EMAIL PROTECTED]> > *Sent:* January 13, 2013 7:48 AM > *To:* [EMAIL PROTECTED] > *Subject:* Re: Storing images in Hbase > > Thanks Jack for sharing this information. This definitely makes sense when > using the type of caching layer. You mentioned about increasing write > cache, I am assuming you had to increase the following parameters in > addition to increase the memstore size: > > hbase.hregion.max.filesize > hbase.hregion.memstore.flush.size > > On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > We buffer all accesses to HBASE with Varnish SSD based caching layer. > > So the impact for reads is negligible. We have 70 node cluster, 8 GB > > of RAM per node, relatively weak nodes (intel core 2 duo), with > > 10-12TB per server of disks. Inserting 600,000 images per day. We > > have relatively little of compaction activity as we made our write > > cache much larger than read cache - so we don't experience region file > > fragmentation as much. > > > > -Jack > > > > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]> > > wrote: > > > I think it really depends on volume of the traffic, data distribution > per > > > region, how and when files compaction occurs, number of nodes in the > > > cluster. In my experience when it comes to blob data where you are > > serving > > > 10s of thousand+ requests/sec writes and reads then it's very difficult > > to > > > manage HBase without very hard operations and maintenance in play. Jack > > > earlier mentioned they have 1 billion images, It would be interesting > to > > > know what they see in terms of compaction, no of requests per sec. I'd > be > > > surprised that in high volume site it can be done without any Caching > > layer > > > on the top to alleviate IO spikes that occurs because of GC and > > compactions. > > > > > > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> > > wrote: > > > > > >> IMHO, if the image files are not too huge, Hbase can efficiently serve > > the > > >> purpose. You can store some additional info along with the file > > depending > > >> upon your search criteria to make the search faster. Say if you want > to > > >> fetch images by the type, you can store images in one column and its > > >> extension in another column(jpg, tiff etc). > > >> > > >> BTW, what exactly is the problem which you are facing. You have > written > > >> "But I still cant do it"? > > >> > > >> Warm Regards, > > >> Tariq > > >> https://mtariq.jux.com/ > > >> > > >> > > >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel < > > [EMAIL PROTECTED] > > >> >wrote: > > >> > > >> > That's a viable option. > > >> > HDFS reads are faster than HBase, but it would require first hitting > > the > > >> > index in HBase which points to the file and then fetching the file. > > >> > It could be faster... we found storing binary data in a sequence > file > > and > > >> > indexed on HBase to be faster than HBase, however, YMMV and HBase > has > > >> been > > >> > improved since we did that project.... > > >> > > > >> > > > >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
-
Re: Storing images in HbaseJack Levin 2013-01-20, 19:49
I forgot to mention that I also have this setup:
<property> <name>hbase.hregion.memstore.flush.size</name> <value>33554432</value> <description>Flush more often. Default: 67108864</description> </property> This parameter works on per region amount, so this means if any of my 400 (currently) regions on a regionserver has 30MB+ in memstore, the hbase will flush it to disk. Here are some metrics from a regionserver: requests=2, regions=370, stores=370, storefiles=1390, storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, flushQueueSize=0, usedHeap=3516, maxHeap=4987, blockCacheSize=790656256, blockCacheFree=255245888, blockCacheCount=2436, blockCacheHitCount=218015828, blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, blockCacheHitRatio=94, blockCacheHitCachingRatio=98 Note, that memstore is only 2G, this particular regionserver HEAP is set to 5G. And last but not least, its very important to have good GC setup: export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ -XX:+UseParNewGC \ -XX:NewSize=128m -XX:MaxNewSize=128m \ -XX:-UseAdaptiveSizePolicy \ -XX:+CMSParallelRemarkEnabled \ -XX:-TraceClassUnloading " -Jack On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Hey Jack, > > Thanks for the useful information. By flush size being 15 %, do you mean > the memstore flush size ? 15 % would mean close to 1G, have you seen any > issues with flushes taking too long ? > > Thanks > Varun > > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> That's right, Memstore size , not flush size is increased. Filesize is >> 10G. Overall write cache is 60% of heap and read cache is 20%. Flush size >> is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that can >> be promoted. On the way to hbase images are written to a queue, so that we >> can take Hbase down for maintenance and still do inserts later. ImageShack >> has ‘perma cache’ servers that allows writes and serving of data even when >> hbase is down for hours, consider it 4th replica 😉 outside of hadoop >> >> Jack >> >> *From:* Mohit Anchlia <[EMAIL PROTECTED]> >> *Sent:* January 13, 2013 7:48 AM >> *To:* [EMAIL PROTECTED] >> *Subject:* Re: Storing images in Hbase >> >> Thanks Jack for sharing this information. This definitely makes sense when >> using the type of caching layer. You mentioned about increasing write >> cache, I am assuming you had to increase the following parameters in >> addition to increase the memstore size: >> >> hbase.hregion.max.filesize >> hbase.hregion.memstore.flush.size >> >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >> > We buffer all accesses to HBASE with Varnish SSD based caching layer. >> > So the impact for reads is negligible. We have 70 node cluster, 8 GB >> > of RAM per node, relatively weak nodes (intel core 2 duo), with >> > 10-12TB per server of disks. Inserting 600,000 images per day. We >> > have relatively little of compaction activity as we made our write >> > cache much larger than read cache - so we don't experience region file >> > fragmentation as much. >> > >> > -Jack >> > >> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <[EMAIL PROTECTED]> >> > wrote: >> > > I think it really depends on volume of the traffic, data distribution >> per >> > > region, how and when files compaction occurs, number of nodes in the >> > > cluster. In my experience when it comes to blob data where you are >> > serving >> > > 10s of thousand+ requests/sec writes and reads then it's very difficult >> > to >> > > manage HBase without very hard operations and maintenance in play. Jack >> > > earlier mentioned they have 1 billion images, It would be interesting >> to >> > > know what they see in terms of compaction, no of requests per sec. I'd
-
Re: Storing images in HbaseVarun Sharma 2013-01-22, 01:10
Thanks for the useful information. I wonder why you use only 5G heap when
you have an 8G machine ? Is there a reason to not use all of it (the DataNode typically takes a 1G of RAM) On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > I forgot to mention that I also have this setup: > > <property> > <name>hbase.hregion.memstore.flush.size</name> > <value>33554432</value> > <description>Flush more often. Default: 67108864</description> > </property> > > This parameter works on per region amount, so this means if any of my > 400 (currently) regions on a regionserver has 30MB+ in memstore, the > hbase will flush it to disk. > > > Here are some metrics from a regionserver: > > requests=2, regions=370, stores=370, storefiles=1390, > storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, > flushQueueSize=0, usedHeap=3516, maxHeap=4987, > blockCacheSize=790656256, blockCacheFree=255245888, > blockCacheCount=2436, blockCacheHitCount=218015828, > blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, > blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > > Note, that memstore is only 2G, this particular regionserver HEAP is set > to 5G. > > And last but not least, its very important to have good GC setup: > > export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > -XX:+PrintGCDateStamps > -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ > -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > -XX:+UseParNewGC \ > -XX:NewSize=128m -XX:MaxNewSize=128m \ > -XX:-UseAdaptiveSizePolicy \ > -XX:+CMSParallelRemarkEnabled \ > -XX:-TraceClassUnloading > " > > -Jack > > On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > Hey Jack, > > > > Thanks for the useful information. By flush size being 15 %, do you mean > > the memstore flush size ? 15 % would mean close to 1G, have you seen any > > issues with flushes taking too long ? > > > > Thanks > > Varun > > > > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > >> That's right, Memstore size , not flush size is increased. Filesize is > >> 10G. Overall write cache is 60% of heap and read cache is 20%. Flush > size > >> is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that > can > >> be promoted. On the way to hbase images are written to a queue, so > that we > >> can take Hbase down for maintenance and still do inserts later. > ImageShack > >> has ‘perma cache’ servers that allows writes and serving of data even > when > >> hbase is down for hours, consider it 4th replica 😉 outside of hadoop > >> > >> Jack > >> > >> *From:* Mohit Anchlia <[EMAIL PROTECTED]> > >> *Sent:* January 13, 2013 7:48 AM > >> *To:* [EMAIL PROTECTED] > >> *Subject:* Re: Storing images in Hbase > >> > >> Thanks Jack for sharing this information. This definitely makes sense > when > >> using the type of caching layer. You mentioned about increasing write > >> cache, I am assuming you had to increase the following parameters in > >> addition to increase the memstore size: > >> > >> hbase.hregion.max.filesize > >> hbase.hregion.memstore.flush.size > >> > >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> > >> > We buffer all accesses to HBASE with Varnish SSD based caching layer. > >> > So the impact for reads is negligible. We have 70 node cluster, 8 GB > >> > of RAM per node, relatively weak nodes (intel core 2 duo), with > >> > 10-12TB per server of disks. Inserting 600,000 images per day. We > >> > have relatively little of compaction activity as we made our write > >> > cache much larger than read cache - so we don't experience region file > >> > fragmentation as much. > >> > > >> > -Jack > >> > > >> > On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia < > [EMAIL PROTECTED]> > >> > wrote: > >> > > I think it really depends on volume of the traffic, data > distribution > >> per > >> > > region, how and when files compaction occurs, number of nodes in the
-
Re: Storing images in HbaseVarun Sharma 2013-01-22, 01:12
On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> wrote:
> Thanks for the useful information. I wonder why you use only 5G heap when > you have an 8G machine ? Is there a reason to not use all of it (the > DataNode typically takes a 1G of RAM) > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> I forgot to mention that I also have this setup: >> >> <property> >> <name>hbase.hregion.memstore.flush.size</name> >> <value>33554432</value> >> <description>Flush more often. Default: 67108864</description> >> </property> >> >> This parameter works on per region amount, so this means if any of my >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the >> hbase will flush it to disk. >> >> >> Here are some metrics from a regionserver: >> >> requests=2, regions=370, stores=370, storefiles=1390, >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, >> blockCacheSize=790656256, blockCacheFree=255245888, >> blockCacheCount=2436, blockCacheHitCount=218015828, >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 >> >> Note, that memstore is only 2G, this particular regionserver HEAP is set >> to 5G. >> >> And last but not least, its very important to have good GC setup: >> >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ >> -XX:+UseParNewGC \ >> -XX:NewSize=128m -XX:MaxNewSize=128m \ >> -XX:-UseAdaptiveSizePolicy \ >> -XX:+CMSParallelRemarkEnabled \ >> -XX:-TraceClassUnloading >> " >> >> -Jack >> >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> >> wrote: >> > Hey Jack, >> > >> > Thanks for the useful information. By flush size being 15 %, do you mean >> > the memstore flush size ? 15 % would mean close to 1G, have you seen any >> > issues with flushes taking too long ? >> > >> > Thanks >> > Varun >> > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> >> That's right, Memstore size , not flush size is increased. Filesize is >> >> 10G. Overall write cache is 60% of heap and read cache is 20%. Flush >> size >> >> is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that >> can >> >> be promoted. On the way to hbase images are written to a queue, so >> that we >> >> can take Hbase down for maintenance and still do inserts later. >> ImageShack >> >> has ‘perma cache’ servers that allows writes and serving of data even >> when >> >> hbase is down for hours, consider it 4th replica 😉 outside of hadoop >> >> >> >> Jack >> >> >> >> *From:* Mohit Anchlia <[EMAIL PROTECTED]> >> >> *Sent:* January 13, 2013 7:48 AM >> >> *To:* [EMAIL PROTECTED] >> >> *Subject:* Re: Storing images in Hbase >> >> >> >> Thanks Jack for sharing this information. This definitely makes sense >> when >> >> using the type of caching layer. You mentioned about increasing write >> >> cache, I am assuming you had to increase the following parameters in >> >> addition to increase the memstore size: >> >> >> >> hbase.hregion.max.filesize >> >> hbase.hregion.memstore.flush.size >> >> >> >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >> >> >> > We buffer all accesses to HBASE with Varnish SSD based caching layer. >> >> > So the impact for reads is negligible. We have 70 node cluster, 8 GB >> >> > of RAM per node, relatively weak nodes (intel core 2 duo), with >> >> > 10-12TB per server of disks. Inserting 600,000 images per day. We >> >> > have relatively little of compaction activity as we made our write >> >> > cache much larger than read cache - so we don't experience region >> file >> >> > fragmentation as much. >> >> > >> >> > -Jack >> >>
-
Re: Storing images in HbaseJack Levin 2013-01-24, 04:53
Its best to keep some RAM for caching of the filesystem, besides we
also run datanode which takes heap as well. Now, please keep in mind that even if you specify heap of say 5GB, if your server opens threads to communicate with other systems via RPC (which hbase does a lot), you will indeed use HEAP + Nthreads*thread*kb_size. There is a good Sun Microsystems document about it. (I don't have the link handy). -Jack On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > Thanks for the useful information. I wonder why you use only 5G heap when > you have an 8G machine ? Is there a reason to not use all of it (the > DataNode typically takes a 1G of RAM) > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> I forgot to mention that I also have this setup: >> >> <property> >> <name>hbase.hregion.memstore.flush.size</name> >> <value>33554432</value> >> <description>Flush more often. Default: 67108864</description> >> </property> >> >> This parameter works on per region amount, so this means if any of my >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the >> hbase will flush it to disk. >> >> >> Here are some metrics from a regionserver: >> >> requests=2, regions=370, stores=370, storefiles=1390, >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, >> blockCacheSize=790656256, blockCacheFree=255245888, >> blockCacheCount=2436, blockCacheHitCount=218015828, >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 >> >> Note, that memstore is only 2G, this particular regionserver HEAP is set >> to 5G. >> >> And last but not least, its very important to have good GC setup: >> >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ >> -XX:+UseParNewGC \ >> -XX:NewSize=128m -XX:MaxNewSize=128m \ >> -XX:-UseAdaptiveSizePolicy \ >> -XX:+CMSParallelRemarkEnabled \ >> -XX:-TraceClassUnloading >> " >> >> -Jack >> >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: >> > Hey Jack, >> > >> > Thanks for the useful information. By flush size being 15 %, do you mean >> > the memstore flush size ? 15 % would mean close to 1G, have you seen any >> > issues with flushes taking too long ? >> > >> > Thanks >> > Varun >> > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> >> That's right, Memstore size , not flush size is increased. Filesize is >> >> 10G. Overall write cache is 60% of heap and read cache is 20%. Flush >> size >> >> is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that >> can >> >> be promoted. On the way to hbase images are written to a queue, so >> that we >> >> can take Hbase down for maintenance and still do inserts later. >> ImageShack >> >> has ‘perma cache’ servers that allows writes and serving of data even >> when >> >> hbase is down for hours, consider it 4th replica 😉 outside of hadoop >> >> >> >> Jack >> >> >> >> *From:* Mohit Anchlia <[EMAIL PROTECTED]> >> >> *Sent:* January 13, 2013 7:48 AM >> >> *To:* [EMAIL PROTECTED] >> >> *Subject:* Re: Storing images in Hbase >> >> >> >> Thanks Jack for sharing this information. This definitely makes sense >> when >> >> using the type of caching layer. You mentioned about increasing write >> >> cache, I am assuming you had to increase the following parameters in >> >> addition to increase the memstore size: >> >> >> >> hbase.hregion.max.filesize >> >> hbase.hregion.memstore.flush.size >> >> >> >> On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >> >> >> > We buffer all accesses to HBASE with Varnish SSD based caching layer. >> >> > So the impact for reads is negligible. We have 70 node cluster, 8 GB
-
Re: Storing images in HbaseS Ahmed 2013-01-24, 22:13
Jack, out of curiosity, how many people manage the hbase related servers?
Does it require constant monitoring or its fairly hands-off now? (or a bit of both, early days was getting things write/learning and now its purring along). On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > Its best to keep some RAM for caching of the filesystem, besides we > also run datanode which takes heap as well. > Now, please keep in mind that even if you specify heap of say 5GB, if > your server opens threads to communicate with other systems via RPC > (which hbase does a lot), you will indeed use HEAP + > Nthreads*thread*kb_size. There is a good Sun Microsystems document > about it. (I don't have the link handy). > > -Jack > > > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> wrote: > > Thanks for the useful information. I wonder why you use only 5G heap when > > you have an 8G machine ? Is there a reason to not use all of it (the > > DataNode typically takes a 1G of RAM) > > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > >> I forgot to mention that I also have this setup: > >> > >> <property> > >> <name>hbase.hregion.memstore.flush.size</name> > >> <value>33554432</value> > >> <description>Flush more often. Default: 67108864</description> > >> </property> > >> > >> This parameter works on per region amount, so this means if any of my > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the > >> hbase will flush it to disk. > >> > >> > >> Here are some metrics from a regionserver: > >> > >> requests=2, regions=370, stores=370, storefiles=1390, > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > >> blockCacheSize=790656256, blockCacheFree=255245888, > >> blockCacheCount=2436, blockCacheHitCount=218015828, > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > >> > >> Note, that memstore is only 2G, this particular regionserver HEAP is set > >> to 5G. > >> > >> And last but not least, its very important to have good GC setup: > >> > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > >> -XX:+PrintGCDateStamps > >> -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log \ > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > >> -XX:+UseParNewGC \ > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ > >> -XX:-UseAdaptiveSizePolicy \ > >> -XX:+CMSParallelRemarkEnabled \ > >> -XX:-TraceClassUnloading > >> " > >> > >> -Jack > >> > >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> > wrote: > >> > Hey Jack, > >> > > >> > Thanks for the useful information. By flush size being 15 %, do you > mean > >> > the memstore flush size ? 15 % would mean close to 1G, have you seen > any > >> > issues with flushes taking too long ? > >> > > >> > Thanks > >> > Varun > >> > > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> > wrote: > >> > > >> >> That's right, Memstore size , not flush size is increased. Filesize > is > >> >> 10G. Overall write cache is 60% of heap and read cache is 20%. Flush > >> size > >> >> is 15%. 64 maxlogs at 128MB. One namenode server, one secondary that > >> can > >> >> be promoted. On the way to hbase images are written to a queue, so > >> that we > >> >> can take Hbase down for maintenance and still do inserts later. > >> ImageShack > >> >> has ‘perma cache’ servers that allows writes and serving of data even > >> when > >> >> hbase is down for hours, consider it 4th replica 😉 outside of hadoop > >> >> > >> >> Jack > >> >> > >> >> *From:* Mohit Anchlia <[EMAIL PROTECTED]> > >> >> *Sent:* January 13, 2013 7:48 AM > >> >> *To:* [EMAIL PROTECTED] > >> >> *Subject:* Re: Storing images in Hbase > >> >> > >> >> Thanks Jack for sharing this information. This definitely makes sense
-
Re: Storing images in HbaseJack Levin 2013-01-25, 07:41
Two people including myself, its fairly hands off. Took about 3 months to
tune it right, however we did have had multiple years of experience with datanodes and hadoop in general, so that was a good boost. We have 4 hbase clusters today, image store being largest On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > Jack, out of curiosity, how many people manage the hbase related servers? > > Does it require constant monitoring or its fairly hands-off now? (or a bit > of both, early days was getting things write/learning and now its purring > along). > > > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > Its best to keep some RAM for caching of the filesystem, besides we > > also run datanode which takes heap as well. > > Now, please keep in mind that even if you specify heap of say 5GB, if > > your server opens threads to communicate with other systems via RPC > > (which hbase does a lot), you will indeed use HEAP + > > Nthreads*thread*kb_size. There is a good Sun Microsystems document > > about it. (I don't have the link handy). > > > > -Jack > > > > > > > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> > wrote: > > > Thanks for the useful information. I wonder why you use only 5G heap > when > > > you have an 8G machine ? Is there a reason to not use all of it (the > > > DataNode typically takes a 1G of RAM) > > > > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> > wrote: > > > > > >> I forgot to mention that I also have this setup: > > >> > > >> <property> > > >> <name>hbase.hregion.memstore.flush.size</name> > > >> <value>33554432</value> > > >> <description>Flush more often. Default: 67108864</description> > > >> </property> > > >> > > >> This parameter works on per region amount, so this means if any of my > > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the > > >> hbase will flush it to disk. > > >> > > >> > > >> Here are some metrics from a regionserver: > > >> > > >> requests=2, regions=370, stores=370, storefiles=1390, > > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > > >> blockCacheSize=790656256, blockCacheFree=255245888, > > >> blockCacheCount=2436, blockCacheHitCount=218015828, > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > > >> > > >> Note, that memstore is only 2G, this particular regionserver HEAP is > set > > >> to 5G. > > >> > > >> And last but not least, its very important to have good GC setup: > > >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > > >> -XX:+PrintGCDateStamps > > >> -XX:+HeapDumpOnOutOfMemoryError -Xloggc:$HBASE_HOME/logs/gc-hbase.log > \ > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > > >> -XX:+UseParNewGC \ > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ > > >> -XX:-UseAdaptiveSizePolicy \ > > >> -XX:+CMSParallelRemarkEnabled \ > > >> -XX:-TraceClassUnloading > > >> " > > >> > > >> -Jack > > >> > > >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> > > wrote: > > >> > Hey Jack, > > >> > > > >> > Thanks for the useful information. By flush size being 15 %, do you > > mean > > >> > the memstore flush size ? 15 % would mean close to 1G, have you seen > > any > > >> > issues with flushes taking too long ? > > >> > > > >> > Thanks > > >> > Varun > > >> > > > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> > > wrote: > > >> > > > >> >> That's right, Memstore size , not flush size is increased. > Filesize > > is > > >> >> 10G. Overall write cache is 60% of heap and read cache is 20%. > Flush > > >> size > > >> >> is 15%. 64 maxlogs at 128MB. One namenode server, one secondary > that > > >> can > > >> >> be promoted. On the way to hbase images are written to a queue, so > > >> that we > >
-
Re: Storing images in HbaseS Ahmed 2013-01-27, 02:00
That's pretty amazing.
What I am confused is, why did you go with hbase and not just straight into hdfs? On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > Two people including myself, its fairly hands off. Took about 3 months to > tune it right, however we did have had multiple years of experience with > datanodes and hadoop in general, so that was a good boost. > > We have 4 hbase clusters today, image store being largest > On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > > > Jack, out of curiosity, how many people manage the hbase related servers? > > > > Does it require constant monitoring or its fairly hands-off now? (or a > bit > > of both, early days was getting things write/learning and now its purring > > along). > > > > > > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > > > Its best to keep some RAM for caching of the filesystem, besides we > > > also run datanode which takes heap as well. > > > Now, please keep in mind that even if you specify heap of say 5GB, if > > > your server opens threads to communicate with other systems via RPC > > > (which hbase does a lot), you will indeed use HEAP + > > > Nthreads*thread*kb_size. There is a good Sun Microsystems document > > > about it. (I don't have the link handy). > > > > > > -Jack > > > > > > > > > > > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> > > wrote: > > > > Thanks for the useful information. I wonder why you use only 5G heap > > when > > > > you have an 8G machine ? Is there a reason to not use all of it (the > > > > DataNode typically takes a 1G of RAM) > > > > > > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> > > wrote: > > > > > > > >> I forgot to mention that I also have this setup: > > > >> > > > >> <property> > > > >> <name>hbase.hregion.memstore.flush.size</name> > > > >> <value>33554432</value> > > > >> <description>Flush more often. Default: 67108864</description> > > > >> </property> > > > >> > > > >> This parameter works on per region amount, so this means if any of > my > > > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the > > > >> hbase will flush it to disk. > > > >> > > > >> > > > >> Here are some metrics from a regionserver: > > > >> > > > >> requests=2, regions=370, stores=370, storefiles=1390, > > > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, > > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > > > >> blockCacheSize=790656256, blockCacheFree=255245888, > > > >> blockCacheCount=2436, blockCacheHitCount=218015828, > > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, > > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > > > >> > > > >> Note, that memstore is only 2G, this particular regionserver HEAP is > > set > > > >> to 5G. > > > >> > > > >> And last but not least, its very important to have good GC setup: > > > >> > > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > > > >> -XX:+PrintGCDateStamps > > > >> -XX:+HeapDumpOnOutOfMemoryError > -Xloggc:$HBASE_HOME/logs/gc-hbase.log > > \ > > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > > > >> -XX:+UseParNewGC \ > > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ > > > >> -XX:-UseAdaptiveSizePolicy \ > > > >> -XX:+CMSParallelRemarkEnabled \ > > > >> -XX:-TraceClassUnloading > > > >> " > > > >> > > > >> -Jack > > > >> > > > >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> > > > wrote: > > > >> > Hey Jack, > > > >> > > > > >> > Thanks for the useful information. By flush size being 15 %, do > you > > > mean > > > >> > the memstore flush size ? 15 % would mean close to 1G, have you > seen > > > any > > > >> > issues with flushes taking too long ? > > > >> > > > > >> > Thanks > > > >> > Varun > > > >> > > > > >> > On Sun, Jan 13, 2013 at 8:17 AM, Jack Levin <[EMAIL PROTECTED]> > > > wrote:
-
Re: Storing images in HbaseJack Levin 2013-01-27, 02:56
AFAIK, namenode would not like tracking 20 billion small files :)
-jack On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: > That's pretty amazing. > > What I am confused is, why did you go with hbase and not just straight into > hdfs? > > > > > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> Two people including myself, its fairly hands off. Took about 3 months to >> tune it right, however we did have had multiple years of experience with >> datanodes and hadoop in general, so that was a good boost. >> >> We have 4 hbase clusters today, image store being largest >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: >> >> > Jack, out of curiosity, how many people manage the hbase related servers? >> > >> > Does it require constant monitoring or its fairly hands-off now? (or a >> bit >> > of both, early days was getting things write/learning and now its purring >> > along). >> > >> > >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> > > Its best to keep some RAM for caching of the filesystem, besides we >> > > also run datanode which takes heap as well. >> > > Now, please keep in mind that even if you specify heap of say 5GB, if >> > > your server opens threads to communicate with other systems via RPC >> > > (which hbase does a lot), you will indeed use HEAP + >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems document >> > > about it. (I don't have the link handy). >> > > >> > > -Jack >> > > >> > > >> > > >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> >> > wrote: >> > > > Thanks for the useful information. I wonder why you use only 5G heap >> > when >> > > > you have an 8G machine ? Is there a reason to not use all of it (the >> > > > DataNode typically takes a 1G of RAM) >> > > > >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> >> > wrote: >> > > > >> > > >> I forgot to mention that I also have this setup: >> > > >> >> > > >> <property> >> > > >> <name>hbase.hregion.memstore.flush.size</name> >> > > >> <value>33554432</value> >> > > >> <description>Flush more often. Default: 67108864</description> >> > > >> </property> >> > > >> >> > > >> This parameter works on per region amount, so this means if any of >> my >> > > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the >> > > >> hbase will flush it to disk. >> > > >> >> > > >> >> > > >> Here are some metrics from a regionserver: >> > > >> >> > > >> requests=2, regions=370, stores=370, storefiles=1390, >> > > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828, >> > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 >> > > >> >> > > >> Note, that memstore is only 2G, this particular regionserver HEAP is >> > set >> > > >> to 5G. >> > > >> >> > > >> And last but not least, its very important to have good GC setup: >> > > >> >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >> > > >> -XX:+PrintGCDateStamps >> > > >> -XX:+HeapDumpOnOutOfMemoryError >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log >> > \ >> > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ >> > > >> -XX:+UseParNewGC \ >> > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ >> > > >> -XX:-UseAdaptiveSizePolicy \ >> > > >> -XX:+CMSParallelRemarkEnabled \ >> > > >> -XX:-TraceClassUnloading >> > > >> " >> > > >> >> > > >> -Jack >> > > >> >> > > >> On Thu, Jan 17, 2013 at 3:29 PM, Varun Sharma <[EMAIL PROTECTED]> >> > > wrote: >> > > >> > Hey Jack, >> > > >> > >> > > >> > Thanks for the useful information. By flush size being 15 %, do >> you >> > > mean >> > > >> > the memstore flush size ? 15 % would mean close to 1G, have you
-
Re: Storing images in Hbaseyiyu jia 2013-01-27, 15:37
Hi Jack,
Thanks so much for sharing! Do you have comments on storing video in HDFS? thanks and regards, Yiyu On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > AFAIK, namenode would not like tracking 20 billion small files :) > > -jack > > On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: > > That's pretty amazing. > > > > What I am confused is, why did you go with hbase and not just straight > into > > hdfs? > > > > > > > > > > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > >> Two people including myself, its fairly hands off. Took about 3 months > to > >> tune it right, however we did have had multiple years of experience with > >> datanodes and hadoop in general, so that was a good boost. > >> > >> We have 4 hbase clusters today, image store being largest > >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > >> > >> > Jack, out of curiosity, how many people manage the hbase related > servers? > >> > > >> > Does it require constant monitoring or its fairly hands-off now? (or > a > >> bit > >> > of both, early days was getting things write/learning and now its > purring > >> > along). > >> > > >> > > >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> > wrote: > >> > > >> > > Its best to keep some RAM for caching of the filesystem, besides we > >> > > also run datanode which takes heap as well. > >> > > Now, please keep in mind that even if you specify heap of say 5GB, > if > >> > > your server opens threads to communicate with other systems via RPC > >> > > (which hbase does a lot), you will indeed use HEAP + > >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems document > >> > > about it. (I don't have the link handy). > >> > > > >> > > -Jack > >> > > > >> > > > >> > > > >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> > >> > wrote: > >> > > > Thanks for the useful information. I wonder why you use only 5G > heap > >> > when > >> > > > you have an 8G machine ? Is there a reason to not use all of it > (the > >> > > > DataNode typically takes a 1G of RAM) > >> > > > > >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> > >> > wrote: > >> > > > > >> > > >> I forgot to mention that I also have this setup: > >> > > >> > >> > > >> <property> > >> > > >> <name>hbase.hregion.memstore.flush.size</name> > >> > > >> <value>33554432</value> > >> > > >> <description>Flush more often. Default: 67108864</description> > >> > > >> </property> > >> > > >> > >> > > >> This parameter works on per region amount, so this means if any > of > >> my > >> > > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, > the > >> > > >> hbase will flush it to disk. > >> > > >> > >> > > >> > >> > > >> Here are some metrics from a regionserver: > >> > > >> > >> > > >> requests=2, regions=370, stores=370, storefiles=1390, > >> > > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, > >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, > >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828, > >> > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, > >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > >> > > >> > >> > > >> Note, that memstore is only 2G, this particular regionserver > HEAP is > >> > set > >> > > >> to 5G. > >> > > >> > >> > > >> And last but not least, its very important to have good GC setup: > >> > > >> > >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > >> > > >> -XX:+PrintGCDateStamps > >> > > >> -XX:+HeapDumpOnOutOfMemoryError > >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log > >> > \ > >> > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > >> > > >> -XX:+UseParNewGC \ > >> > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ > >> > > >> -XX:-UseAdaptiveSizePolicy \
-
Re: Storing images in HbaseJack Levin 2013-01-27, 16:56
We did some experiments, open source project HOOP works well with
interfacing to HDFS to expose REST Api interface to your file system. -Jack On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote: > Hi Jack, > > Thanks so much for sharing! Do you have comments on storing video in HDFS? > > thanks and regards, > > Yiyu > > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> AFAIK, namenode would not like tracking 20 billion small files :) >> >> -jack >> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: >> > That's pretty amazing. >> > >> > What I am confused is, why did you go with hbase and not just straight >> into >> > hdfs? >> > >> > >> > >> > >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> >> Two people including myself, its fairly hands off. Took about 3 months >> to >> >> tune it right, however we did have had multiple years of experience with >> >> datanodes and hadoop in general, so that was a good boost. >> >> >> >> We have 4 hbase clusters today, image store being largest >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: >> >> >> >> > Jack, out of curiosity, how many people manage the hbase related >> servers? >> >> > >> >> > Does it require constant monitoring or its fairly hands-off now? (or >> a >> >> bit >> >> > of both, early days was getting things write/learning and now its >> purring >> >> > along). >> >> > >> >> > >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> >> wrote: >> >> > >> >> > > Its best to keep some RAM for caching of the filesystem, besides we >> >> > > also run datanode which takes heap as well. >> >> > > Now, please keep in mind that even if you specify heap of say 5GB, >> if >> >> > > your server opens threads to communicate with other systems via RPC >> >> > > (which hbase does a lot), you will indeed use HEAP + >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems document >> >> > > about it. (I don't have the link handy). >> >> > > >> >> > > -Jack >> >> > > >> >> > > >> >> > > >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> >> >> > wrote: >> >> > > > Thanks for the useful information. I wonder why you use only 5G >> heap >> >> > when >> >> > > > you have an 8G machine ? Is there a reason to not use all of it >> (the >> >> > > > DataNode typically takes a 1G of RAM) >> >> > > > >> >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> >> >> > wrote: >> >> > > > >> >> > > >> I forgot to mention that I also have this setup: >> >> > > >> >> >> > > >> <property> >> >> > > >> <name>hbase.hregion.memstore.flush.size</name> >> >> > > >> <value>33554432</value> >> >> > > >> <description>Flush more often. Default: 67108864</description> >> >> > > >> </property> >> >> > > >> >> >> > > >> This parameter works on per region amount, so this means if any >> of >> >> my >> >> > > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, >> the >> >> > > >> hbase will flush it to disk. >> >> > > >> >> >> > > >> >> >> > > >> Here are some metrics from a regionserver: >> >> > > >> >> >> > > >> requests=2, regions=370, stores=370, storefiles=1390, >> >> > > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, >> >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, >> >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, >> >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828, >> >> > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, >> >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 >> >> > > >> >> >> > > >> Note, that memstore is only 2G, this particular regionserver >> HEAP is >> >> > set >> >> > > >> to 5G. >> >> > > >> >> >> > > >> And last but not least, its very important to have good GC setup: >> >> > > >> >> >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m >> >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
-
Re: Storing images in Hbaseyiyu jia 2013-01-27, 21:58
Hi Jack,
Thank you. I never heard about HOOD before. I should learn it. Also, do you store meta data of each video clip directly in HDFS or you have other storage like memcache? thanks and regards, Yiyu On Sun, Jan 27, 2013 at 11:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > We did some experiments, open source project HOOP works well with > interfacing to HDFS to expose REST Api interface to your file system. > > -Jack > > On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote: > > Hi Jack, > > > > Thanks so much for sharing! Do you have comments on storing video in > HDFS? > > > > thanks and regards, > > > > Yiyu > > > > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > >> AFAIK, namenode would not like tracking 20 billion small files :) > >> > >> -jack > >> > >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: > >> > That's pretty amazing. > >> > > >> > What I am confused is, why did you go with hbase and not just straight > >> into > >> > hdfs? > >> > > >> > > >> > > >> > > >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> > wrote: > >> > > >> >> Two people including myself, its fairly hands off. Took about 3 > months > >> to > >> >> tune it right, however we did have had multiple years of experience > with > >> >> datanodes and hadoop in general, so that was a good boost. > >> >> > >> >> We have 4 hbase clusters today, image store being largest > >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > >> >> > >> >> > Jack, out of curiosity, how many people manage the hbase related > >> servers? > >> >> > > >> >> > Does it require constant monitoring or its fairly hands-off now? > (or > >> a > >> >> bit > >> >> > of both, early days was getting things write/learning and now its > >> purring > >> >> > along). > >> >> > > >> >> > > >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> > >> wrote: > >> >> > > >> >> > > Its best to keep some RAM for caching of the filesystem, besides > we > >> >> > > also run datanode which takes heap as well. > >> >> > > Now, please keep in mind that even if you specify heap of say > 5GB, > >> if > >> >> > > your server opens threads to communicate with other systems via > RPC > >> >> > > (which hbase does a lot), you will indeed use HEAP + > >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems > document > >> >> > > about it. (I don't have the link handy). > >> >> > > > >> >> > > -Jack > >> >> > > > >> >> > > > >> >> > > > >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma < > [EMAIL PROTECTED]> > >> >> > wrote: > >> >> > > > Thanks for the useful information. I wonder why you use only 5G > >> heap > >> >> > when > >> >> > > > you have an 8G machine ? Is there a reason to not use all of it > >> (the > >> >> > > > DataNode typically takes a 1G of RAM) > >> >> > > > > >> >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin < > [EMAIL PROTECTED]> > >> >> > wrote: > >> >> > > > > >> >> > > >> I forgot to mention that I also have this setup: > >> >> > > >> > >> >> > > >> <property> > >> >> > > >> <name>hbase.hregion.memstore.flush.size</name> > >> >> > > >> <value>33554432</value> > >> >> > > >> <description>Flush more often. Default: > 67108864</description> > >> >> > > >> </property> > >> >> > > >> > >> >> > > >> This parameter works on per region amount, so this means if > any > >> of > >> >> my > >> >> > > >> 400 (currently) regions on a regionserver has 30MB+ in > memstore, > >> the > >> >> > > >> hbase will flush it to disk. > >> >> > > >> > >> >> > > >> > >> >> > > >> Here are some metrics from a regionserver: > >> >> > > >> > >> >> > > >> requests=2, regions=370, stores=370, storefiles=1390, > >> >> > > >> storefileIndexSize=304, memstoreSize=2233, > compactionQueueSize=0, > >> >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > >> >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, > >> >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828,
-
Re: Storing images in HbaseJack Levin 2013-01-28, 04:06
We store image/media data into second hbase cluster, but I don't see a
reason why it would not work with the same cluster as a separate family for example. -Jack On Sun, Jan 27, 2013 at 1:58 PM, yiyu jia <[EMAIL PROTECTED]> wrote: > Hi Jack, > > Thank you. I never heard about HOOD before. I should learn it. > > Also, do you store meta data of each video clip directly in HDFS or you > have other storage like memcache? > > thanks and regards, > > Yiyu > > > On Sun, Jan 27, 2013 at 11:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> We did some experiments, open source project HOOP works well with >> interfacing to HDFS to expose REST Api interface to your file system. >> >> -Jack >> >> On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote: >> > Hi Jack, >> > >> > Thanks so much for sharing! Do you have comments on storing video in >> HDFS? >> > >> > thanks and regards, >> > >> > Yiyu >> > >> > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> >> AFAIK, namenode would not like tracking 20 billion small files :) >> >> >> >> -jack >> >> >> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: >> >> > That's pretty amazing. >> >> > >> >> > What I am confused is, why did you go with hbase and not just straight >> >> into >> >> > hdfs? >> >> > >> >> > >> >> > >> >> > >> >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> >> wrote: >> >> > >> >> >> Two people including myself, its fairly hands off. Took about 3 >> months >> >> to >> >> >> tune it right, however we did have had multiple years of experience >> with >> >> >> datanodes and hadoop in general, so that was a good boost. >> >> >> >> >> >> We have 4 hbase clusters today, image store being largest >> >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: >> >> >> >> >> >> > Jack, out of curiosity, how many people manage the hbase related >> >> servers? >> >> >> > >> >> >> > Does it require constant monitoring or its fairly hands-off now? >> (or >> >> a >> >> >> bit >> >> >> > of both, early days was getting things write/learning and now its >> >> purring >> >> >> > along). >> >> >> > >> >> >> > >> >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> >> >> wrote: >> >> >> > >> >> >> > > Its best to keep some RAM for caching of the filesystem, besides >> we >> >> >> > > also run datanode which takes heap as well. >> >> >> > > Now, please keep in mind that even if you specify heap of say >> 5GB, >> >> if >> >> >> > > your server opens threads to communicate with other systems via >> RPC >> >> >> > > (which hbase does a lot), you will indeed use HEAP + >> >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems >> document >> >> >> > > about it. (I don't have the link handy). >> >> >> > > >> >> >> > > -Jack >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma < >> [EMAIL PROTECTED]> >> >> >> > wrote: >> >> >> > > > Thanks for the useful information. I wonder why you use only 5G >> >> heap >> >> >> > when >> >> >> > > > you have an 8G machine ? Is there a reason to not use all of it >> >> (the >> >> >> > > > DataNode typically takes a 1G of RAM) >> >> >> > > > >> >> >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin < >> [EMAIL PROTECTED]> >> >> >> > wrote: >> >> >> > > > >> >> >> > > >> I forgot to mention that I also have this setup: >> >> >> > > >> >> >> >> > > >> <property> >> >> >> > > >> <name>hbase.hregion.memstore.flush.size</name> >> >> >> > > >> <value>33554432</value> >> >> >> > > >> <description>Flush more often. Default: >> 67108864</description> >> >> >> > > >> </property> >> >> >> > > >> >> >> >> > > >> This parameter works on per region amount, so this means if >> any >> >> of >> >> >> my >> >> >> > > >> 400 (currently) regions on a regionserver has 30MB+ in >> memstore, >> >> the >> >> >> > > >> hbase will flush it to disk. >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> Here are some metrics from a regionserver:
-
Re: Storing images in HbaseJack Levin 2013-01-28, 04:16
One thing to note about our setup, is that images that come from
end-users (uploads) are first inserted into a queuing system from which they are pulled by an asynchronous job and inserted into HBASE. This allows us to bring HBASE down for maintenance without loosing any of the uploads in the process. Our namenode being a single point of failure is not really an issue, because if you run multiple hbase clusters, you can actually replicate data between them, so if you lose a first cluster forever, you can still have your data/images on the secondary cluster that will function normally, at some point its possible to run distributed copy between HDFS if needed. Just to do some shameless self promotion :) -- Our company is taking consulting orders, we can setup Hadoop/HBASE for image storing with hardware of your choice, or we can sell a Rack worth a petabyte of 'elastic' hbase store, or we can rent our own cluster with Restful API. If anyone's interested, ping me off the list please. Thanks. -Jack On Sun, Jan 27, 2013 at 8:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > We store image/media data into second hbase cluster, but I don't see a > reason why it would not work with the same cluster as a separate > family for example. > > -Jack > > On Sun, Jan 27, 2013 at 1:58 PM, yiyu jia <[EMAIL PROTECTED]> wrote: >> Hi Jack, >> >> Thank you. I never heard about HOOD before. I should learn it. >> >> Also, do you store meta data of each video clip directly in HDFS or you >> have other storage like memcache? >> >> thanks and regards, >> >> Yiyu >> >> >> On Sun, Jan 27, 2013 at 11:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> >>> We did some experiments, open source project HOOP works well with >>> interfacing to HDFS to expose REST Api interface to your file system. >>> >>> -Jack >>> >>> On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote: >>> > Hi Jack, >>> > >>> > Thanks so much for sharing! Do you have comments on storing video in >>> HDFS? >>> > >>> > thanks and regards, >>> > >>> > Yiyu >>> > >>> > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: >>> > >>> >> AFAIK, namenode would not like tracking 20 billion small files :) >>> >> >>> >> -jack >>> >> >>> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: >>> >> > That's pretty amazing. >>> >> > >>> >> > What I am confused is, why did you go with hbase and not just straight >>> >> into >>> >> > hdfs? >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> >>> wrote: >>> >> > >>> >> >> Two people including myself, its fairly hands off. Took about 3 >>> months >>> >> to >>> >> >> tune it right, however we did have had multiple years of experience >>> with >>> >> >> datanodes and hadoop in general, so that was a good boost. >>> >> >> >>> >> >> We have 4 hbase clusters today, image store being largest >>> >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: >>> >> >> >>> >> >> > Jack, out of curiosity, how many people manage the hbase related >>> >> servers? >>> >> >> > >>> >> >> > Does it require constant monitoring or its fairly hands-off now? >>> (or >>> >> a >>> >> >> bit >>> >> >> > of both, early days was getting things write/learning and now its >>> >> purring >>> >> >> > along). >>> >> >> > >>> >> >> > >>> >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> >>> >> wrote: >>> >> >> > >>> >> >> > > Its best to keep some RAM for caching of the filesystem, besides >>> we >>> >> >> > > also run datanode which takes heap as well. >>> >> >> > > Now, please keep in mind that even if you specify heap of say >>> 5GB, >>> >> if >>> >> >> > > your server opens threads to communicate with other systems via >>> RPC >>> >> >> > > (which hbase does a lot), you will indeed use HEAP + >>> >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems >>> document >>> >> >> > > about it. (I don't have the link handy). >>> >> >> > >
-
Re: Storing images in HbaseAdrien Mogenet 2013-01-28, 10:01
Could HCatalog be an option ?
Le 26 janv. 2013 21:56, "Jack Levin" <[EMAIL PROTECTED]> a écrit : > > AFAIK, namenode would not like tracking 20 billion small files :) > > -jack > > On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: > > That's pretty amazing. > > > > What I am confused is, why did you go with hbase and not just straight into > > hdfs? > > > > > > > > > > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > >> Two people including myself, its fairly hands off. Took about 3 months to > >> tune it right, however we did have had multiple years of experience with > >> datanodes and hadoop in general, so that was a good boost. > >> > >> We have 4 hbase clusters today, image store being largest > >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > >> > >> > Jack, out of curiosity, how many people manage the hbase related servers? > >> > > >> > Does it require constant monitoring or its fairly hands-off now? (or a > >> bit > >> > of both, early days was getting things write/learning and now its purring > >> > along). > >> > > >> > > >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > >> > > >> > > Its best to keep some RAM for caching of the filesystem, besides we > >> > > also run datanode which takes heap as well. > >> > > Now, please keep in mind that even if you specify heap of say 5GB, if > >> > > your server opens threads to communicate with other systems via RPC > >> > > (which hbase does a lot), you will indeed use HEAP + > >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems document > >> > > about it. (I don't have the link handy). > >> > > > >> > > -Jack > >> > > > >> > > > >> > > > >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> > >> > wrote: > >> > > > Thanks for the useful information. I wonder why you use only 5G heap > >> > when > >> > > > you have an 8G machine ? Is there a reason to not use all of it (the > >> > > > DataNode typically takes a 1G of RAM) > >> > > > > >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> > >> > wrote: > >> > > > > >> > > >> I forgot to mention that I also have this setup: > >> > > >> > >> > > >> <property> > >> > > >> <name>hbase.hregion.memstore.flush.size</name> > >> > > >> <value>33554432</value> > >> > > >> <description>Flush more often. Default: 67108864</description> > >> > > >> </property> > >> > > >> > >> > > >> This parameter works on per region amount, so this means if any of > >> my > >> > > >> 400 (currently) regions on a regionserver has 30MB+ in memstore, the > >> > > >> hbase will flush it to disk. > >> > > >> > >> > > >> > >> > > >> Here are some metrics from a regionserver: > >> > > >> > >> > > >> requests=2, regions=370, stores=370, storefiles=1390, > >> > > >> storefileIndexSize=304, memstoreSize=2233, compactionQueueSize=0, > >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, > >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, > >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828, > >> > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, > >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 > >> > > >> > >> > > >> Note, that memstore is only 2G, this particular regionserver HEAP is > >> > set > >> > > >> to 5G. > >> > > >> > >> > > >> And last but not least, its very important to have good GC setup: > >> > > >> > >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m > >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails > >> > > >> -XX:+PrintGCDateStamps > >> > > >> -XX:+HeapDumpOnOutOfMemoryError > >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log > >> > \ > >> > > >> -XX:MaxTenuringThreshold=15 -XX:SurvivorRatio=8 \ > >> > > >> -XX:+UseParNewGC \ > >> > > >> -XX:NewSize=128m -XX:MaxNewSize=128m \ > >> > > >> -XX:-UseAdaptiveSizePolicy \ > >> > > >> -XX:+CMSParallelRemarkEnabled \ > >> > > >> -XX:-TraceClassUnloading > >> > > >> " [EMAIL PROTECTED]> do you 20%. secondary queue, later. data of makes increasing parameters [EMAIL PROTECTED]> caching with day. our nodes where very maintenance in requests per without any GC the Say if column You fetching a YMMV to a existing Because file issues you answer issues your one
-
Re: Storing images in HbaseJack Levin 2013-01-28, 18:08
I've never tried it, HBASE worked out nicely for this task, caching
and all is a bonus for files. -jack On Mon, Jan 28, 2013 at 2:01 AM, Adrien Mogenet <[EMAIL PROTECTED]> wrote: > Could HCatalog be an option ? > Le 26 janv. 2013 21:56, "Jack Levin" <[EMAIL PROTECTED]> a écrit : >> >> AFAIK, namenode would not like tracking 20 billion small files :) >> >> -jack >> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: >> > That's pretty amazing. >> > >> > What I am confused is, why did you go with hbase and not just straight > into >> > hdfs? >> > >> > >> > >> > >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> wrote: >> > >> >> Two people including myself, its fairly hands off. Took about 3 months > to >> >> tune it right, however we did have had multiple years of experience > with >> >> datanodes and hadoop in general, so that was a good boost. >> >> >> >> We have 4 hbase clusters today, image store being largest >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: >> >> >> >> > Jack, out of curiosity, how many people manage the hbase related > servers? >> >> > >> >> > Does it require constant monitoring or its fairly hands-off now? > (or a >> >> bit >> >> > of both, early days was getting things write/learning and now its > purring >> >> > along). >> >> > >> >> > >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> > wrote: >> >> > >> >> > > Its best to keep some RAM for caching of the filesystem, besides we >> >> > > also run datanode which takes heap as well. >> >> > > Now, please keep in mind that even if you specify heap of say 5GB, > if >> >> > > your server opens threads to communicate with other systems via RPC >> >> > > (which hbase does a lot), you will indeed use HEAP + >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems document >> >> > > about it. (I don't have the link handy). >> >> > > >> >> > > -Jack >> >> > > >> >> > > >> >> > > >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <[EMAIL PROTECTED]> >> >> > wrote: >> >> > > > Thanks for the useful information. I wonder why you use only 5G > heap >> >> > when >> >> > > > you have an 8G machine ? Is there a reason to not use all of it > (the >> >> > > > DataNode typically takes a 1G of RAM) >> >> > > > >> >> > > > On Sun, Jan 20, 2013 at 11:49 AM, Jack Levin <[EMAIL PROTECTED]> >> >> > wrote: >> >> > > > >> >> > > >> I forgot to mention that I also have this setup: >> >> > > >> >> >> > > >> <property> >> >> > > >> <name>hbase.hregion.memstore.flush.size</name> >> >> > > >> <value>33554432</value> >> >> > > >> <description>Flush more often. Default: 67108864</description> >> >> > > >> </property> >> >> > > >> >> >> > > >> This parameter works on per region amount, so this means if any > of >> >> my >> >> > > >> 400 (currently) regions on a regionserver has 30MB+ in > memstore, the >> >> > > >> hbase will flush it to disk. >> >> > > >> >> >> > > >> >> >> > > >> Here are some metrics from a regionserver: >> >> > > >> >> >> > > >> requests=2, regions=370, stores=370, storefiles=1390, >> >> > > >> storefileIndexSize=304, memstoreSize=2233, > compactionQueueSize=0, >> >> > > >> flushQueueSize=0, usedHeap=3516, maxHeap=4987, >> >> > > >> blockCacheSize=790656256, blockCacheFree=255245888, >> >> > > >> blockCacheCount=2436, blockCacheHitCount=218015828, >> >> > > >> blockCacheMissCount=13514652, blockCacheEvictedCount=2561516, >> >> > > >> blockCacheHitRatio=94, blockCacheHitCachingRatio=98 >> >> > > >> >> >> > > >> Note, that memstore is only 2G, this particular regionserver > HEAP is >> >> > set >> >> > > >> to 5G. >> >> > > >> >> >> > > >> And last but not least, its very important to have good GC > setup: >> >> > > >> >> >> > > >> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms5000m >> >> > > >> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails >> >> > > >> -XX:+PrintGCDateStamps >> >> > > >> -XX:+HeapDumpOnOutOfMemoryError >> >> -Xloggc:$HBASE_HOME/logs/gc-hbase.log
-
Re: Storing images in HbaseAndrew Purtell 2013-01-28, 18:58
If I were to design a large object store on HBase, I would do the
following: Under a threshold, store the object data into HBase. Over the threshold, store metadata for the object only into HBase and the object data itself into a file in HDFS. The threshold could be a fixed byte size like 100 MB, or you could segment storage by MIME type, for example image/* into HBase and video/* into HDFS. Video objects might be as large as 5-10 GB, full length features, depending on encoding bitrate. HBase can pack millions or billions of small objects into much larger indexed files that can be quickly retrieved, and this helps avoid namespace pressures on the HDFS NameNode. However, the HBase API cannot do positioned reads of partial byte ranges of stored objects, while the HDFS API can. Put smaller objects into HBase. Put larger objects into HDFS so you can stream them at approximately the same rate that the end user reads them and minimize overheads for server side buffering. As Jack mentions, there is Hoop ( https://github.com/cloudera/hoop) or WebHDFS ( http://hadoop.apache.org/docs/stable/webhdfs.html) for accessing HDFS via a RESTful API. Both will let you do positioned reads of partial byte ranges out of HDFS. On the HBase side, is HBase's REST interface ( http://wiki.apache.org/hadoop/Hbase/Stargate). Put a cache in between the HDFS and HBase services and the front end because even with the capabilities of HBase and HDFS you should always have a caching tier between the datastore and the front end. On Sun, Jan 27, 2013 at 8:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > We did some experiments, open source project HOOP works well with > interfacing to HDFS to expose REST Api interface to your file system. > > -Jack > > On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote: > > Hi Jack, > > > > Thanks so much for sharing! Do you have comments on storing video in > HDFS? > > > > thanks and regards, > > > > Yiyu > > > > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > >> AFAIK, namenode would not like tracking 20 billion small files :) > >> > >> -jack > >> > >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote: > >> > That's pretty amazing. > >> > > >> > What I am confused is, why did you go with hbase and not just straight > >> into > >> > hdfs? > >> > > >> > > >> > > >> > > >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> > wrote: > >> > > >> >> Two people including myself, its fairly hands off. Took about 3 > months > >> to > >> >> tune it right, however we did have had multiple years of experience > with > >> >> datanodes and hadoop in general, so that was a good boost. > >> >> > >> >> We have 4 hbase clusters today, image store being largest > >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > >> >> > >> >> > Jack, out of curiosity, how many people manage the hbase related > >> servers? > >> >> > > >> >> > Does it require constant monitoring or its fairly hands-off now? > (or > >> a > >> >> bit > >> >> > of both, early days was getting things write/learning and now its > >> purring > >> >> > along). > >> >> > > >> >> > > >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]> > >> wrote: > >> >> > > >> >> > > Its best to keep some RAM for caching of the filesystem, besides > we > >> >> > > also run datanode which takes heap as well. > >> >> > > Now, please keep in mind that even if you specify heap of say > 5GB, > >> if > >> >> > > your server opens threads to communicate with other systems via > RPC > >> >> > > (which hbase does a lot), you will indeed use HEAP + > >> >> > > Nthreads*thread*kb_size. There is a good Sun Microsystems > document > >> >> > > about it. (I don't have the link handy). > >> >> > > > >> >> > > -Jack > >> >> > > > >> >> > > > >> >> > > > >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma < > [EMAIL PROTECTED]> > >> >> > wrote: > >> >> > > > Thanks for the useful information. I wonder why you use only 5G Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Storing images in Hbaseyiyu jia 2013-01-28, 20:23
Hi jack,
thank you for sharing! Hello Andrew, You mentioned an interesting topic, which is cache. My question is why I need cache between HBase and HDFS if I have cache configured between HBase and its caller application? Let's say I have an web application which use HBase as data source at the backend. I have cache configured in my reverse proxy which is at the front of my Web server. And the cache is configured based on URL pattern or parameters. In this case, the cached data will be delivered to the client if the input parameter/url is same. So, the same data cached behind Web server wil not be hitted. if this is the case, I will say the cache between HBase and HDFS will not be helpful. But, I think the real case should not as simple as I described as above. Can you please expand a little bit on the cache topic? thanks and regards, Yiyu On Mon, Jan 28, 2013 at 1:58 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > If I were to design a large object store on HBase, I would do the > following: Under a threshold, store the object data into HBase. Over the > threshold, store metadata for the object only into HBase and the object > data itself into a file in HDFS. The threshold could be a fixed byte size > like 100 MB, or you could segment storage by MIME type, for example image/* > into HBase and video/* into HDFS. Video objects might be as large as 5-10 > GB, full length features, depending on encoding bitrate. HBase can pack > millions or billions of small objects into much larger indexed files that > can be quickly retrieved, and this helps avoid namespace pressures on the > HDFS NameNode. However, the HBase API cannot do positioned reads of partial > byte ranges of stored objects, while the HDFS API can. Put smaller objects > into HBase. Put larger objects into HDFS so you can stream them at > approximately the same rate that the end user reads them and minimize > overheads for server side buffering. As Jack mentions, there is Hoop ( > https://github.com/cloudera/hoop) or WebHDFS ( > http://hadoop.apache.org/docs/stable/webhdfs.html) for accessing HDFS via > a > RESTful API. Both will let you do positioned reads of partial byte ranges > out of HDFS. On the HBase side, is HBase's REST interface ( > http://wiki.apache.org/hadoop/Hbase/Stargate). Put a cache in between the > HDFS and HBase services and the front end because even with the > capabilities of HBase and HDFS you should always have a caching tier > between the datastore and the front end. > > > On Sun, Jan 27, 2013 at 8:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > We did some experiments, open source project HOOP works well with > > interfacing to HDFS to expose REST Api interface to your file system. > > > > -Jack > > > > On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote: > > > Hi Jack, > > > > > > Thanks so much for sharing! Do you have comments on storing video in > > HDFS? > > > > > > thanks and regards, > > > > > > Yiyu > > > > > > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote: > > > > > >> AFAIK, namenode would not like tracking 20 billion small files :) > > >> > > >> -jack > > >> > > >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> > wrote: > > >> > That's pretty amazing. > > >> > > > >> > What I am confused is, why did you go with hbase and not just > straight > > >> into > > >> > hdfs? > > >> > > > >> > > > >> > > > >> > > > >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]> > > wrote: > > >> > > > >> >> Two people including myself, its fairly hands off. Took about 3 > > months > > >> to > > >> >> tune it right, however we did have had multiple years of experience > > with > > >> >> datanodes and hadoop in general, so that was a good boost. > > >> >> > > >> >> We have 4 hbase clusters today, image store being largest > > >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote: > > >> >> > > >> >> > Jack, out of curiosity, how many people manage the hbase related
-
Re: Storing images in HbaseAndrew Purtell 2013-01-28, 21:13
You bring up a very common consideration I think.
For static content, such as images, then a cache can help offload read load from the datastore. This fits into this conversation. For dynamic content, then an external caching may not be helpful as you say, although blockcache within HBase will help if you are assembling content dynamically from repeated queries some of which are bringing in the same data over and over. On Mon, Jan 28, 2013 at 12:23 PM, yiyu jia <[EMAIL PROTECTED]> wrote: > Let's say I have an web application which use HBase as data source at the > backend. I have cache configured in my reverse proxy which is at the front > of my Web server. And the cache is configured based on URL pattern or > parameters. In this case, the cached data will be delivered to the client > if the input parameter/url is same. So, the same data cached behind Web > server wil not be hitted. if this is the case, I will say the cache between > HBase and HDFS will not be helpful. > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
-
Re: Storing images in Hbaseyiyu jia 2013-01-28, 21:44
Hi Andy,
Thanks a lot for sharing. Yes. I am not talking about static content caching, which may be called as internal CDN today. I am asking some techniques of configuring cache on different layers with concerning about avoiding duplicate caching on different layers. thanks and regards, Yiyu On Mon, Jan 28, 2013 at 4:13 PM, Andrew Purtell <[EMAIL PROTECTED]> wrote: > You bring up a very common consideration I think. > > For static content, such as images, then a cache can help offload read load > from the datastore. This fits into this conversation. > > For dynamic content, then an external caching may not be helpful as you > say, although blockcache within HBase will help if you are assembling > content dynamically from repeated queries some of which are bringing in the > same data over and over. > > On Mon, Jan 28, 2013 at 12:23 PM, yiyu jia <[EMAIL PROTECTED]> wrote: > > > Let's say I have an web application which use HBase as data source at the > > backend. I have cache configured in my reverse proxy which is at the > front > > of my Web server. And the cache is configured based on URL pattern or > > parameters. In this case, the cached data will be delivered to the client > > if the input parameter/url is same. So, the same data cached behind Web > > server wil not be hitted. if this is the case, I will say the cache > between > > HBase and HDFS will not be helpful. > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
-
Re: Storing images in HbaseAndrew Purtell 2013-01-28, 21:49
In that case, then hypothetically speaking, you could disable HBase
blockcache on the table containing static content and rely on an external reverse proxy tier, and enable HBase blockcache on the tables that you are using as part of generation of dynamic content. On Mon, Jan 28, 2013 at 1:44 PM, yiyu jia <[EMAIL PROTECTED]> wrote: > I am asking some techniques of configuring cache on different layers with > concerning about avoiding duplicate caching on different layers. > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) |