Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Storing images in Hbase


Copy link to this message
-
Re: Storing images in Hbase
I think it really depends on volume of the traffic, data distribution per
region, how and when files compaction occurs, number of nodes in the
cluster. In my experience when it comes to blob data where you are serving
10s of thousand+ requests/sec writes and reads then it's very difficult to
manage HBase without very hard operations and maintenance in play. Jack
earlier mentioned they have 1 billion images, It would be interesting to
know what they see in terms of compaction, no of requests per sec. I'd be
surprised that in high volume site it can be done without any Caching layer
on the top to alleviate IO spikes that occurs because of GC and compactions.

On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <[EMAIL PROTECTED]> wrote:

> IMHO, if the image files are not too huge, Hbase can efficiently serve the
> purpose. You can store some additional info along with the file depending
> upon your search criteria to make the search faster. Say if you want to
> fetch images by the type, you can store images in one column and its
> extension in another column(jpg, tiff etc).
>
> BTW, what exactly is the problem which you are facing. You have written
> "But I still cant do it"?
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
>
>
> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <[EMAIL PROTECTED]
> >wrote:
>
> > That's a viable option.
> > HDFS reads are faster than HBase, but it would require first hitting the
> > index in HBase which points to the file and then fetching the file.
> > It could be faster... we found storing binary data in a sequence file and
> > indexed on HBase to be faster than HBase, however, YMMV and HBase has
> been
> > improved since we did that project....
> >
> >
> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
> [EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi Kavish,
> > >
> > > i have a better idea for you copy your image files to a single file on
> > > hdfs, and if new image comes append it to the existing image, and keep
> > and
> > > update the metadata and the offset to the HBase. Because if you put
> > bigger
> > > image in hbase it wil lead to some issue.
> > >
> > >
> > >
> > > ∞
> > > Shashwat Shriparv
> > >
> > >
> > >
> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <[EMAIL PROTECTED]>
> wrote:
> > >
> > >> Interesting. That's close to a PB if my math is correct.
> > >> Is there a write up about this somewhere? Something that we could link
> > >> from the HBase homepage?
> > >>
> > >> -- Lars
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Jack Levin <[EMAIL PROTECTED]>
> > >> To: [EMAIL PROTECTED]
> > >> Cc: Andrew Purtell <[EMAIL PROTECTED]>
> > >> Sent: Thursday, January 10, 2013 9:24 AM
> > >> Subject: Re: Storing images in Hbase
> > >>
> > >> We stored about 1 billion images into hbase with file size up to 10MB.
> > >> Its been running for close to 2 years without issues and serves
> > >> delivery of images for Yfrog and ImageShack.  If you have any
> > >> questions about the setup, I would be glad to answer them.
> > >>
> > >> -Jack
> > >>
> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <[EMAIL PROTECTED]
> >
> > >> wrote:
> > >>> I have done extensive testing and have found that blobs don't belong
> in
> > >> the
> > >>> databases but are rather best left out on the file system. Andrew
> > >> outlined
> > >>> issues that you'll face and not to mention IO issues when compaction
> > >> occurs
> > >>> over large files.
> > >>>
> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <[EMAIL PROTECTED]
> >
> > >> wrote:
> > >>>
> > >>>> I meant this to say "a few really large values"
> > >>>>
> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
> [EMAIL PROTECTED]>
> > >>>> wrote:
> > >>>>
> > >>>>> Consider if the split threshold is 2 GB but your one row contains
> 10
> > >> GB
> > >>>> as
> > >>>>> really large value.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>
> > >>>>   - Andy
> > >>>>
> > >>>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB