Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # dev >> DISCUSS : HFile V3 proposal for tags in 0.96


Copy link to this message
-
Re: DISCUSS : HFile V3 proposal for tags in 0.96
I have been following comments on HBASE-8496.

I think introducing cell tagging through HFile v3 is acceptable.

Looking forward to seeing your implementation.

Cheers

On Thu, Jul 18, 2013 at 10:14 AM, ramkrishna vasudevan <
[EMAIL PROTECTED]> wrote:

> For the past couple of months, we have been working through various
> prototypes for supporting inline storage of tags in cells as persisted on
> disk. Our goals are to support optional use of tags with minimal changes to
> core code while also avoiding performance impacts to users who do not use
> tags.
>
>  For background, refer to the comments in
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13708228&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13708228
>
> and
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
>
>  We have iterated on a couple of prototypes that implement tag awareness in
> DataBlockEncoders, later as a new type of Codec for Cells. This point is
> discussed in the above comments in HBASE-8496.
>
> We think that tag awareness in Cell Codecs is the right way, but there are
> some shortcomings with the current interfaces internal to HFile that need
> to addressed in order to avoid any performance impacts for those who do not
> want to use inline tags, and that may involve a drastic amount of code
> change.
>
>  We can avoid several problems with HFile V2 internals, and backwards
> compatibility concerns, and allow for working tags support with no
> performance impact and low risk to all HBase users who do not want tag
> support, while still allowing for inline tags capabilities in a shipping
> version of HBase, by introducing this in a new V3 version for HFile.
>
>  The new V3 version for HFile differs from earlier versions by supporting
> inline tag storage.  This version does not change the HFileBlock format
> whereas it just serializes and deserializes the Tag information that would
> be persisted in the HFile. Having HFile V3 would also help to keep Tags
> optional such that the existing cases where there are no tags are totally
> unaffected.  Also we ensure that we keep the changes outside of the V3
> reader and writer minimal.  Compatibility would not be a problem with
> future versions when we go with Cell Codecs.  What Codecs used for writing
> the file will be persisted in the HFile header.  Now for files that are
> either V2 or V3 we will instantiate two default codecs that know to deal
> with serializations with and without tags.
>
>  There have been thoughts on an HFile V3 prior, e.g.:
>
>
> https://issues.apache.org/jira/browse/HBASE-8496?focusedCommentId=13710653&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13710653
>
>  We have been working on this and will have a clean patch with good amount
> of testing in time for 0.96.
>
> Although our focus is on performance-neutral persistence of inline cell
> tags in 0.96 to enable a couple of security coprocessor users, introducing
> an HFile V3 provides design freedom for some other features and problems
> too that can be developed through the 0.96 cycle into 0.98.
>
> Pls voice your opinion on this so that we can make this clear and may be
> define the scope of the patch.  Also feel free to comment on HBASE-8496 on
> your thoughts and ideas.
>
> Regards
>
> Ram
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB