Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - DISCUSS : HFile V3 proposal for tags in 0.96


Copy link to this message
-
Re: DISCUSS : HFile V3 proposal for tags in 0.96
Ted Yu 2013-07-18, 17:23
I have been following comments on HBASE-8496.

I think introducing cell tagging through HFile v3 is acceptable.

Looking forward to seeing your implementation.

Cheers

On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> For the past couple of months, we have been working through various
> prototypes for supporting inline storage of tags in cells as persisted on
> disk. Our goals are to support optional use of tags with minimal changes to
> core code while also avoiding performance impacts to users who do not use
> tags.
>
>  For background, refer to the comments in
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
>
> and
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
>
>  We have iterated on a couple of prototypes that implement tag awareness in
> DataBlockEncoders, later as a new type of Codec for Cells. This point is
> discussed in the above comments in HBASE-8496.
>
> We think that tag awareness in Cell Codecs is the right way, but there are
> some shortcomings with the current interfaces internal to HFile that need
> to addressed in order to avoid any performance impacts for those who do not
> want to use inline tags, and that may involve a drastic amount of code
> change.
>
>  We can avoid several problems with HFile V2 internals, and backwards
> compatibility concerns, and allow for working tags support with no
> performance impact and low risk to all HBase users who do not want tag
> support, while still allowing for inline tags capabilities in a shipping
> version of HBase, by introducing this in a new V3 version for HFile.
>
>  The new V3 version for HFile differs from earlier versions by supporting
> inline tag storage.  This version does not change the HFileBlock format
> whereas it just serializes and deserializes the Tag information that would
> be persisted in the HFile. Having HFile V3 would also help to keep Tags
> optional such that the existing cases where there are no tags are totally
> unaffected.  Also we ensure that we keep the changes outside of the V3
> reader and writer minimal.  Compatibility would not be a problem with
> future versions when we go with Cell Codecs.  What Codecs used for writing
> the file will be persisted in the HFile header.  Now for files that are
> either V2 or V3 we will instantiate two default codecs that know to deal
> with serializations with and without tags.
>
>  There have been thoughts on an HFile V3 prior, e.g.:
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
>
>  We have been working on this and will have a clean patch with good amount
> of testing in time for 0.96.
>
> Although our focus is on performance-neutral persistence of inline cell
> tags in 0.96 to enable a couple of security coprocessor users, introducing
> an HFile V3 provides design freedom for some other features and problems
> too that can be developed through the 0.96 cycle into 0.98.
>
> Pls voice your opinion on this so that we can make this clear and may be
> define the scope of the patch.  Also feel free to comment on HBASE-8496 on
> your thoughts and ideas.
>
> Regards
>
> Ram
>