Interesting topic about constructing file/block system on top of HBase. Similar with Facebook Haystack or Taobao TFS targeting at small files management? It is great to see if you can opensource on github with some benchmark, Roman~
Sounds like how to well utilize and configure HBase data server's mechanism itself like compaction/flush would be key to the performance and effort of maintenance?
Best Regards, Julian
On Oct 24, 2013, at 12:19 PM, Wei Tan <[EMAIL PROTECTED]> wrote:
> Roman, thanks for sharing your experience. Is your approach is somewhat
> similar to Facebook's image store, Haystack?
> I am very interested in knowing your use case, and what you actually mean
> by class abstraction, internal write buffer, etc.
> Best regards,
> From: Roman Nikitchenko <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED],
> Date: 10/23/2013 02:51 AM
> Subject: Re: How can I insert large image or video into HBase?
> We had need to store LOT of files (mostly not so big but they could be up
> to few G) and we have decided to do it based on HBase. We store files in
> column blocks.
> Shortly solution is:
> 1. 2 Column families: single for metadata and single for content.
> 2. Class abstraction that provides client with stream to write file or
> 3. Internal write buffer and internal buffer of formed puts so write speed
> is really good. Up to 2 times better than HDFS on files below 128K.
> 4. If client uses buffered writes, place buffers 1:1 to columns
> (segmentation control).
> 5. Seek is implemented based on 2 client filters (to limit column range
> to get only qualifiers). So based on skip() we check what block shall we
> buffer and set current position.
> Advantages of this solution are:
> - Small files problem is solved (shall I comment something here?).
> - Thread safe without headaches.
> - Possibility to use compression transparently.
> - Metadata is really flexible (ok, HDFS can get it but again using HBase,
> otherwise - small files problem is yours).
> - Locality control due to regions (not possible with HDFS).
> - Very effective MR processing due to previous point.
> - Ability to use 'lightweight MR'
> - Somewhat more complex client. We just have encapsulated this and don't
> care any more. Right today I plan to add SHA1 hashes support.
> - On large files (10M and more) we are notable slower than HDFS. Probably
> it can be improved with MemStores configuration and so on but I just don't
> care, for our needs it is enough.
> These ideas should be enough to understand approach.
> BTW I consider to publish this solution on GitHub to get some kind of
> 'community review'.
> Best regards,
> On 23 October 2013 07:12, Jean-Marc Spaggiari
> <[EMAIL PROTECTED]>wrote:
>> Put your file into HDFS and store only the name into HBase. HBase is not
>> done do store large files.
>> 2013/10/23 Jack Chan <[EMAIL PROTECTED]>
>> > Hi All:
>> > This could be a stupid question.But here it goes....
>> > We knew that we can use "put" to insert some small files by converting
>> > to bytes first.
>> > But for a large file,I think we would better stream it first.
>> > So,how can we insert the large file into HBase through Java code using
>> > stream way?
>> > Thanks and regards
>> > Jack Chan.