Roman, thanks for sharing your experience. Is your approach is somewhat
similar to Facebook's image store, Haystack?
I am very interested in knowing your use case, and what you actually mean
by class abstraction, internal write buffer, etc.
From: Roman Nikitchenko <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED],
Date: 10/23/2013 02:51 AM
Subject: Re: How can I insert large image or video into HBase?
We had need to store LOT of files (mostly not so big but they could be up
to few G) and we have decided to do it based on HBase. We store files in
Shortly solution is:
1. 2 Column families: single for metadata and single for content.
2. Class abstraction that provides client with stream to write file or
3. Internal write buffer and internal buffer of formed puts so write speed
is really good. Up to 2 times better than HDFS on files below 128K.
4. If client uses buffered writes, place buffers 1:1 to columns
5. Seek is implemented based on 2 client filters (to limit column range
to get only qualifiers). So based on skip() we check what block shall we
buffer and set current position.
Advantages of this solution are:
- Small files problem is solved (shall I comment something here?).
- Thread safe without headaches.
- Possibility to use compression transparently.
- Metadata is really flexible (ok, HDFS can get it but again using HBase,
otherwise - small files problem is yours).
- Locality control due to regions (not possible with HDFS).
- Very effective MR processing due to previous point.
- Ability to use 'lightweight MR'
- Somewhat more complex client. We just have encapsulated this and don't
care any more. Right today I plan to add SHA1 hashes support.
- On large files (10M and more) we are notable slower than HDFS. Probably
it can be improved with MemStores configuration and so on but I just don't
care, for our needs it is enough.
These ideas should be enough to understand approach.
BTW I consider to publish this solution on GitHub to get some kind of
On 23 October 2013 07:12, Jean-Marc Spaggiari
> Put your file into HDFS and store only the name into HBase. HBase is not
> done do store large files.
> 2013/10/23 Jack Chan <[EMAIL PROTECTED]>
> > Hi All:
> > This could be a stupid question.But here it goes....
> > We knew that we can use "put" to insert some small files by converting
> > to bytes first.
> > But for a large file,I think we would better stream it first.
> > So,how can we insert the large file into HBase through Java code using
> > stream way?
> > Thanks and regards
> > Jack Chan.