Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Major Compacting ISAMs


+
Hugh Xedni 2012-07-27, 15:23
+
John Armstrong 2012-07-27, 15:35
Copy link to this message
-
Re: Major Compacting ISAMs
John is spot on. However, there's one additional implication to mention,
which is that you need to pick a table structure that doesn't require
adding more data to the same tablet over time if you are continuing to
write new data to your table. Depending on what type of indexing you would
like to use, this generally requires using a document-partitioned structure
like that used in the WikiSearch example:
http://accumulo.apache.org/example/wikisearch.html

For some problems (like building a graph or an RDF triple store) this isn't
really feasible, and you will eventually need to major compact.

Cheers,
Adam
On Fri, Jul 27, 2012 at 11:35 AM, John Armstrong <[EMAIL PROTECTED]> wrote:

> On 07/27/2012 11:23 AM, Hugh Xedni wrote:
>
>> If I load sorted key-value map or ISAM files into HDFS via bulk loading,
>> how can I ensure only one file will be assigned to a tablet and major
>> compaction is avoided?
>>
>
> I think (and those more knowledgeable will correct me if I'm wrong) that
> you could achieve this by
>
> (a) making sure that all your bulk-load files contain non-overlapping
> Accumulo key ranges and are
>
> (b) each smaller than the maximum tablet size on the table, and
>
> (c) setting the table splits to the file key range boundaries before bulk
> importing.
>
> These should be sufficient conditions, though possibly (likely?) not
> necessary.
>
> hth
>