Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo design questions


+
Sukant Hajra 2012-11-06, 19:01
+
John Vines 2012-11-06, 19:16
+
Keith Turner 2012-11-06, 19:54
Copy link to this message
-
Re: Accumulo design questions
On Tue, Nov 6, 2012 at 11:01 AM, Sukant Hajra <[EMAIL PROTECTED]>wrote:

> I've been trying to understand Accumulo more deeply as we use it more.  To
> supplement the on-line documentation and source, I've been referencing some
> blog articles on HBase (Lars George has some ones), HBase docs, and the
> BigTable paper.
>
> But I'm curious about some of the deviations of Accumulo from BigTable and
> HBase.
>
> The questions I have right now are:
>
>     1. Is the format of an RFile close to HFile version 1, HFile version
> 2, or
>     at this point is the format really it's own thing?  I found good
>     documentation on the HFile, but I haven't yet found similar
> documentation
>     on RFiles.  There's the source code, but I haven't dug into that yet.
>

I think there is a different HFile for each column family, isn't there?  An
RFile stores all columns, all locality groups in a single file, which is
another reason you don't get the same performance penalty for having lots
of column families in Accumulo.
>
>     2. I understand that HBase doesn't do well with too many column
> families.
>     However, creating too many column families in HBase isn't likely anyway
>     because you can't (I believe) create them dynamically.  Accumulo
> allows you
>     to create column families dynamically.  But I wonder if this can come
> at a
>     cost.  Is there a benefit to using column families less frequently if
>     possible in Accumulo?  Or is the cost of using column families more or
> less
>     the same as using column qualifiers.
>
>     3. I guess one way families might be different from qualifiers relates
> to
>     HBase's recommendation to keep column family names short to avoid
> needless
>     storage waste.  That should apply to Accumulo as well, right?
>
>     4. In supporting dynamic column families, was there a design trade-off
> with
>     respect to the original BigTable or current HBase design?  What might
> be a
>     benefit of doing it the other way?
>

The main thing Accumulo had to do differently from BigTable to allow
dynamic creation of column families was to create a default locality
group.  That's the locality group that stores column families that aren't
specified for any other locality group.  I recall Keith saying it was kind
of a pain to implement, but I don't see any obvious negative tradeoffs of
the design.

Billie

> Thanks,
> Sukant
>
+
Keith Turner 2012-11-06, 21:07
+
Adam Fuchs 2012-11-06, 21:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB