Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase, mail # dev - prefix compression implementation


Copy link to this message
-
Re: prefix compression implementation
Matt Corgan 2011-09-21, 01:04
jacek >> It is a huge chance. It would be great if we could prototype a few
things.
Especially I would like to avoid any optimizations before we know a got
way to measure them.

matt >> agree.  i'm not in a rush to get any of this integrated, just trying
to feel out the right long-term strategy.  do you have unit tests that
you're running on a substantial amount of data to compare different
implementations?
On Tue, Sep 20, 2011 at 4:58 PM, Jacek Migdal <[EMAIL PROTECTED]> wrote:

>
>
> On 9/20/11 10:59 AM, "Matt Corgan" <[EMAIL PROTECTED]> wrote:
>
> >bringing all questions into a single email:
> >
> >stack >> I'd say call it Cell rather than HCell.
> >
> >i did think the H was a very simple way to add uniqueness, like isn't
> >"HFile" a big win over "File"?  there are already two other classes called
> >"Cell" in hbase (guava and REST gateway).  another option could be KV,
> >though i don't like making exceptions to java's no-abbreviations
> >guidelines.
> KeyValueCell?
>
> To be honest, no name seems to be a very good option. However, it would be
> nice if it would be somewhat related to KeyValue.
>
> On large scope, it would be hard to integrate this interface anytime soon.
> I would rather do it later.
>
> >stack >> There is a patch lying around that adds a version to KV by using
> >top
> >two bytes of the type byte.  If you need me to dig it up, just say
> >(then you might not have to have v1 stuff in your Interface).
> >
> >not sure what you mean here.  top two bits?  you mean encoding the
> >timestamp
> >inside the type byte?
> Versioning KeyValue per KeyValue seems to be crazy. Shouldn't it be per
> block or file.
>
>
> >(interface discussion)
> >
> It is a huge chance. It would be great if we could prototype a few things.
> Especially I would like to avoid any optimizations before we know a got
> way to measure them.
>
> Jacek
>
>