Do you care about low latency? if so, then maybe it's not a good choice to store big file into hbase, especially few G size, that's definitely will bring a GC hurt:)
发件人: Roman Nikitchenko [[EMAIL PROTECTED]]
发送时间: 2013年10月23日 14:50
收件人: [EMAIL PROTECTED]
主题: Re: How can I insert large image or video into HBase?
We had need to store LOT of files (mostly not so big but they could be up
to few G) and we have decided to do it based on HBase. We store files in
Shortly solution is:
1. 2 Column families: single for metadata and single for content.
2. Class abstraction that provides client with stream to write file or read
3. Internal write buffer and internal buffer of formed puts so write speed
is really good. Up to 2 times better than HDFS on files below 128K.
4. If client uses buffered writes, place buffers 1:1 to columns
5. Seek is implemented based on 2 client filters (to limit column range and
to get only qualifiers). So based on skip() we check what block shall we
buffer and set current position.
Advantages of this solution are:
- Small files problem is solved (shall I comment something here?).
- Thread safe without headaches.
- Possibility to use compression transparently.
- Metadata is really flexible (ok, HDFS can get it but again using HBase,
otherwise - small files problem is yours).
- Locality control due to regions (not possible with HDFS).
- Very effective MR processing due to previous point.
- Ability to use 'lightweight MR'
- Somewhat more complex client. We just have encapsulated this and don't
care any more. Right today I plan to add SHA1 hashes support.
- On large files (10M and more) we are notable slower than HDFS. Probably
it can be improved with MemStores configuration and so on but I just don't
care, for our needs it is enough.
These ideas should be enough to understand approach.
BTW I consider to publish this solution on GitHub to get some kind of
On 23 October 2013 07:12, Jean-Marc Spaggiari <[EMAIL PROTECTED]>wrote:
> Put your file into HDFS and store only the name into HBase. HBase is not
> done do store large files.
> 2013/10/23 Jack Chan <[EMAIL PROTECTED]>
> > Hi All:
> > This could be a stupid question.But here it goes....
> > We knew that we can use "put" to insert some small files by converting it
> > to bytes first.
> > But for a large file,I think we would better stream it first.
> > So,how can we insert the large file into HBase through Java code using
> > stream way?
> > Thanks and regards
> > Jack Chan.