Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo design questions


+
Sukant Hajra 2012-11-06, 19:01
+
John Vines 2012-11-06, 19:16
+
Keith Turner 2012-11-06, 19:54
Copy link to this message
-
Re: Accumulo design questions
On Tue, Nov 6, 2012 at 11:01 AM, Sukant Hajra <[EMAIL PROTECTED]>wrote:

> I've been trying to understand Accumulo more deeply as we use it more.  To
> supplement the on-line documentation and source, I've been referencing some
> blog articles on HBase (Lars George has some ones), HBase docs, and the
> BigTable paper.
>
> But I'm curious about some of the deviations of Accumulo from BigTable and
> HBase.
>
> The questions I have right now are:
>
>     1. Is the format of an RFile close to HFile version 1, HFile version
> 2, or
>     at this point is the format really it's own thing?  I found good
>     documentation on the HFile, but I haven't yet found similar
> documentation
>     on RFiles.  There's the source code, but I haven't dug into that yet.
>

I think there is a different HFile for each column family, isn't there?  An
RFile stores all columns, all locality groups in a single file, which is
another reason you don't get the same performance penalty for having lots
of column families in Accumulo.
>
>     2. I understand that HBase doesn't do well with too many column
> families.
>     However, creating too many column families in HBase isn't likely anyway
>     because you can't (I believe) create them dynamically.  Accumulo
> allows you
>     to create column families dynamically.  But I wonder if this can come
> at a
>     cost.  Is there a benefit to using column families less frequently if
>     possible in Accumulo?  Or is the cost of using column families more or
> less
>     the same as using column qualifiers.
>
>     3. I guess one way families might be different from qualifiers relates
> to
>     HBase's recommendation to keep column family names short to avoid
> needless
>     storage waste.  That should apply to Accumulo as well, right?
>
>     4. In supporting dynamic column families, was there a design trade-off
> with
>     respect to the original BigTable or current HBase design?  What might
> be a
>     benefit of doing it the other way?
>

The main thing Accumulo had to do differently from BigTable to allow
dynamic creation of column families was to create a default locality
group.  That's the locality group that stores column families that aren't
specified for any other locality group.  I recall Keith saying it was kind
of a pain to implement, but I don't see any obvious negative tradeoffs of
the design.

Billie

> Thanks,
> Sukant
>
+
Keith Turner 2012-11-06, 21:07
+
Adam Fuchs 2012-11-06, 21:41