Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> Storing images in Hbase


Copy link to this message
-
Re: Storing images in Hbase
One thing to note about our setup, is that images that come from
end-users (uploads) are first inserted into a queuing system from
which they are pulled by an asynchronous job and inserted into HBASE.
This allows us to bring HBASE down for maintenance without loosing any
of the uploads in the process.

Our namenode being a single point of failure is not really an issue,
because if you run multiple hbase clusters, you can actually replicate
data between them, so if you lose a first cluster forever, you can
still have your data/images on the secondary cluster that will
function normally, at some point its possible to run distributed copy
between HDFS if needed.

Just to do some shameless self promotion :) -- Our company is taking
consulting orders, we can setup Hadoop/HBASE for image storing with
hardware of your choice, or we can sell a Rack worth a petabyte of
'elastic' hbase store, or we can rent our own cluster with Restful
API.  If anyone's interested, ping me off the list please.

Thanks.

-Jack

On Sun, Jan 27, 2013 at 8:06 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> We store image/media data into second hbase cluster, but I don't see a
> reason why it would not work with the same cluster as a separate
> family for example.
>
> -Jack
>
> On Sun, Jan 27, 2013 at 1:58 PM, yiyu jia <[EMAIL PROTECTED]> wrote:
>> Hi Jack,
>>
>> Thank you. I never heard about HOOD before. I should learn it.
>>
>> Also, do you store meta data of each video clip directly in HDFS or you
>> have other storage like memcache?
>>
>> thanks and regards,
>>
>> Yiyu
>>
>>
>> On Sun, Jan 27, 2013 at 11:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote:
>>
>>> We did some experiments, open source project HOOP works well with
>>> interfacing to HDFS to expose REST Api interface to your file system.
>>>
>>> -Jack
>>>
>>> On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote:
>>> > Hi Jack,
>>> >
>>> > Thanks so much for sharing! Do you have comments on storing video in
>>> HDFS?
>>> >
>>> > thanks and regards,
>>> >
>>> > Yiyu
>>> >
>>> > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
>>> >
>>> >> AFAIK, namenode would not like tracking 20 billion small files :)
>>> >>
>>> >> -jack
>>> >>
>>> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
>>> >> > That's pretty amazing.
>>> >> >
>>> >> > What I am confused is, why did you go with hbase and not just straight
>>> >> into
>>> >> > hdfs?
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]>
>>> wrote:
>>> >> >
>>> >> >> Two people including myself, its fairly hands off. Took about 3
>>> months
>>> >> to
>>> >> >> tune it right, however we did have had multiple years of experience
>>> with
>>> >> >> datanodes and hadoop in general, so that was a good boost.
>>> >> >>
>>> >> >> We have 4 hbase clusters today, image store being largest
>>> >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:
>>> >> >>
>>> >> >> > Jack, out of curiosity, how many people manage the hbase related
>>> >> servers?
>>> >> >> >
>>> >> >> > Does it require constant monitoring or its fairly hands-off now?
>>>  (or
>>> >> a
>>> >> >> bit
>>> >> >> > of both, early days was getting things write/learning and now its
>>> >> purring
>>> >> >> > along).
>>> >> >> >
>>> >> >> >
>>> >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]>
>>> >> wrote:
>>> >> >> >
>>> >> >> > > Its best to keep some RAM for caching of the filesystem, besides
>>> we
>>> >> >> > > also run datanode which takes heap as well.
>>> >> >> > > Now, please keep in mind that even if you specify heap of say
>>> 5GB,
>>> >> if
>>> >> >> > > your server opens threads to communicate with other systems via
>>> RPC
>>> >> >> > > (which hbase does a lot), you will indeed use HEAP +
>>> >> >> > > Nthreads*thread*kb_size.  There is a good Sun Microsystems
>>> document
>>> >> >> > > about it. (I don't have the link handy).
>>> >> >> > >