Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
HBase >> mail # user >> Re: Storing images in Hbase


+
Michael Segel 2013-01-11, 15:00
+
Mohammad Tariq 2013-01-11, 15:27
+
Mohit Anchlia 2013-01-11, 17:40
+
Jack Levin 2013-01-11, 17:47
+
Jack Levin 2013-01-11, 17:51
+
Mohit Anchlia 2013-01-13, 15:47
+
kavishahuja 2013-01-05, 10:11
+
谢良 2013-01-06, 03:58
+
Mohit Anchlia 2013-01-06, 05:45
+
谢良 2013-01-06, 06:14
+
Damien Hardy 2013-01-06, 09:35
+
Yusup Ashrap 2013-01-06, 11:58
+
Andrew Purtell 2013-01-06, 20:12
+
Asaf Mesika 2013-01-06, 20:28
+
Andrew Purtell 2013-01-06, 20:49
+
Andrew Purtell 2013-01-06, 20:52
+
Mohit Anchlia 2013-01-06, 21:09
+
Amandeep Khurana 2013-01-06, 20:33
+
Marcos Ortiz 2013-01-11, 18:01
+
Jack Levin 2013-01-13, 16:17
+
Varun Sharma 2013-01-17, 23:29
+
Jack Levin 2013-01-20, 19:49
+
Varun Sharma 2013-01-22, 01:10
+
Varun Sharma 2013-01-22, 01:12
+
Jack Levin 2013-01-24, 04:53
+
S Ahmed 2013-01-24, 22:13
+
Jack Levin 2013-01-25, 07:41
+
S Ahmed 2013-01-27, 02:00
+
Jack Levin 2013-01-27, 02:56
+
yiyu jia 2013-01-27, 15:37
+
Jack Levin 2013-01-27, 16:56
+
yiyu jia 2013-01-27, 21:58
+
Jack Levin 2013-01-28, 04:06
+
Jack Levin 2013-01-28, 04:16
Copy link to this message
-
Re: Storing images in Hbase
If I were to design a large object store on HBase, I would do the
following: Under a threshold, store the object data into HBase. Over the
threshold, store metadata for the object only into HBase and the object
data itself into a file in HDFS. The threshold could be a fixed byte size
like 100 MB, or you could segment storage by MIME type, for example image/*
into HBase and video/* into HDFS. Video objects might be as large as 5-10
GB, full length features, depending on encoding bitrate. HBase can pack
millions or billions of small objects into much larger indexed files that
can be quickly retrieved, and this helps avoid namespace pressures on the
HDFS NameNode. However, the HBase API cannot do positioned reads of partial
byte ranges of stored objects, while the HDFS API can. Put smaller objects
into HBase. Put larger objects into HDFS so you can stream them at
approximately the same rate that the end user reads them and minimize
overheads for server side buffering. As Jack mentions, there is Hoop (
https://github.com/cloudera/hoop) or WebHDFS (
http://hadoop.apache.org/docs/stable/webhdfs.html) for accessing HDFS via a
RESTful API. Both will let you do positioned reads of partial byte ranges
out of HDFS. On the HBase side, is HBase's REST interface (
http://wiki.apache.org/hadoop/Hbase/Stargate). Put a cache in between the
HDFS and HBase services and the front end because even with the
capabilities of HBase and HDFS you should always have a caching tier
between the datastore and the front end.
On Sun, Jan 27, 2013 at 8:56 AM, Jack Levin <[EMAIL PROTECTED]> wrote:

> We did some experiments, open source project HOOP works well with
> interfacing to HDFS to expose REST Api interface to your file system.
>
> -Jack
>
> On Sun, Jan 27, 2013 at 7:37 AM, yiyu jia <[EMAIL PROTECTED]> wrote:
> > Hi Jack,
> >
> > Thanks so much for sharing! Do you have comments on storing video in
> HDFS?
> >
> > thanks and regards,
> >
> > Yiyu
> >
> > On Sat, Jan 26, 2013 at 9:56 PM, Jack Levin <[EMAIL PROTECTED]> wrote:
> >
> >> AFAIK, namenode would not like tracking 20 billion small files :)
> >>
> >> -jack
> >>
> >> On Sat, Jan 26, 2013 at 6:00 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> >> > That's pretty amazing.
> >> >
> >> > What I am confused is, why did you go with hbase and not just straight
> >> into
> >> > hdfs?
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jan 25, 2013 at 2:41 AM, Jack Levin <[EMAIL PROTECTED]>
> wrote:
> >> >
> >> >> Two people including myself, its fairly hands off. Took about 3
> months
> >> to
> >> >> tune it right, however we did have had multiple years of experience
> with
> >> >> datanodes and hadoop in general, so that was a good boost.
> >> >>
> >> >> We have 4 hbase clusters today, image store being largest
> >> >> On Jan 24, 2013 2:14 PM, "S Ahmed" <[EMAIL PROTECTED]> wrote:
> >> >>
> >> >> > Jack, out of curiosity, how many people manage the hbase related
> >> servers?
> >> >> >
> >> >> > Does it require constant monitoring or its fairly hands-off now?
>  (or
> >> a
> >> >> bit
> >> >> > of both, early days was getting things write/learning and now its
> >> purring
> >> >> > along).
> >> >> >
> >> >> >
> >> >> > On Wed, Jan 23, 2013 at 11:53 PM, Jack Levin <[EMAIL PROTECTED]>
> >> wrote:
> >> >> >
> >> >> > > Its best to keep some RAM for caching of the filesystem, besides
> we
> >> >> > > also run datanode which takes heap as well.
> >> >> > > Now, please keep in mind that even if you specify heap of say
> 5GB,
> >> if
> >> >> > > your server opens threads to communicate with other systems via
> RPC
> >> >> > > (which hbase does a lot), you will indeed use HEAP +
> >> >> > > Nthreads*thread*kb_size.  There is a good Sun Microsystems
> document
> >> >> > > about it. (I don't have the link handy).
> >> >> > >
> >> >> > > -Jack
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Mon, Jan 21, 2013 at 5:10 PM, Varun Sharma <
> [EMAIL PROTECTED]>
> >> >> > wrote:
> >> >> > > > Thanks for the useful information. I wonder why you use only 5G

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)
+
yiyu jia 2013-01-28, 20:23
+
Andrew Purtell 2013-01-28, 21:13
+
yiyu jia 2013-01-28, 21:44
+
Andrew Purtell 2013-01-28, 21:49
+
Adrien Mogenet 2013-01-28, 10:01
+
Jack Levin 2013-01-28, 18:08