Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Storing images in Hbase


Copy link to this message
-
Re: Storing images in Hbase
What do you mean by "very large"?

One possible source of performance concern is HBase RPC does not do
positioned/chunked/partial reads, so both on the RegionServer and client
the entirety of value data will be in the heap. A lot of really large
objects brought in this way under high concurrency can cause excessive GC
from fragmentation or OOME conditions if the heap isn't adequately sized.
The recommendation of ~10 MB max is to mitigate these effects. There is
nothing scientific about that number though, it's a rule of thumb, I've
built HBase applications with a max value size of 100 MB and it performed
adequately. (Larger objects were split into 100 MB chunks and keyed as
$rowkey$chunk where $chunk was an integer serialized with Bytes.toInt()).

Another is a consequence of the fact a row cannot be split. This means that
if the data in a single row grows significantly larger than the region
split threshold, you will have this one region sized differently from the
others, and this can lead to unexpected behavior. Consider if the split
threshold is 2 GB but your one row contains 10 GB as really large value.
This is undesirable because HBase expects housekeeping on a given region to
be more or less equal to others: compaction, etc.

>From the application POV, if you have a few really big value size outliers,
then these could be like land mines if the app is short scanning over table
data. Gets or Scans including such values will have widely varying latency
from others. But this would be an application design problem.

On Sun, Jan 6, 2013 at 12:28 PM, Asaf Mesika <[EMAIL PROTECTED]> wrote:

> What's the penalty performance wise of saving a very large value in a
> KeyValue in hbase? Splits, scans, etc.
>
> Sent from my iPad
>
> On 6 בינו 2013, at 22:12, Andrew Purtell <[EMAIL PROTECTED]> wrote:
>
> > Also YFrog / ImageShack serves all of its assets out of HBase too, so for
> > reasonably sized images some are having success. See
> > http://www.slideshare.net/jacque74/hug-hbase-presentation
> >
> >
> > On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <[EMAIL PROTECTED]> wrote:
> >
> >> there are a lot great discussions on Quora on this topic.
> >>
> >>
> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
> >> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images
> >>
> >>
> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>

--
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB