Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> DISCUSS : HFile V3 proposal for tags in 0.96


Copy link to this message
-
DISCUSS : HFile V3 proposal for tags in 0.96
For the past couple of months, we have been working through various
prototypes for supporting inline storage of tags in cells as persisted on
disk. Our goals are to support optional use of tags with minimal changes to
core code while also avoiding performance impacts to users who do not use
tags.

 For background, refer to the comments in

https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228

and

https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653

 We have iterated on a couple of prototypes that implement tag awareness in
DataBlockEncoders, later as a new type of Codec for Cells. This point is
discussed in the above comments in HBASE-8496.

We think that tag awareness in Cell Codecs is the right way, but there are
some shortcomings with the current interfaces internal to HFile that need
to addressed in order to avoid any performance impacts for those who do not
want to use inline tags, and that may involve a drastic amount of code
change.

 We can avoid several problems with HFile V2 internals, and backwards
compatibility concerns, and allow for working tags support with no
performance impact and low risk to all HBase users who do not want tag
support, while still allowing for inline tags capabilities in a shipping
version of HBase, by introducing this in a new V3 version for HFile.

 The new V3 version for HFile differs from earlier versions by supporting
inline tag storage.  This version does not change the HFileBlock format
whereas it just serializes and deserializes the Tag information that would
be persisted in the HFile. Having HFile V3 would also help to keep Tags
optional such that the existing cases where there are no tags are totally
unaffected.  Also we ensure that we keep the changes outside of the V3
reader and writer minimal.  Compatibility would not be a problem with
future versions when we go with Cell Codecs.  What Codecs used for writing
the file will be persisted in the HFile header.  Now for files that are
either V2 or V3 we will instantiate two default codecs that know to deal
with serializations with and without tags.

 There have been thoughts on an HFile V3 prior, e.g.:

https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653

 We have been working on this and will have a clean patch with good amount
of testing in time for 0.96.

Although our focus is on performance-neutral persistence of inline cell
tags in 0.96 to enable a couple of security coprocessor users, introducing
an HFile V3 provides design freedom for some other features and problems
too that can be developed through the 0.96 cycle into 0.98.

Pls voice your opinion on this so that we can make this clear and may be
define the scope of the patch.  Also feel free to comment on HBASE-8496 on
your thoughts and ideas.

Regards

Ram