Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> checkpointing

Did a checkpoint feature ever get added?

If not, would it still be possible to do so, perhaps by taking the table to be checkpointed offline, or compacting it, or whatever, then copy the relevant parts of the metadata table to another table. Then, for the rollback / restore processes, simply copy the metadata back into the !METADATA table?

Of course, the garbage collector would have to know not to garbage collect files from the checkpoint.

It would probably be easier to implement by marking entries in the METADATA table as part of a checkpoint, which could also be unmarked to 'delete' the checkpoint.

This feature would be very useful in building aggregate tables, when it's possible that some new additions may get messed up. Particularly, during map reduce jobs that are writing to an aggregated accumulo table, speculative execution, and retried tasks that wrote some results can result in double counting / aggregation of some entries. It'd be very nice if one could checkpoint an aggregated table before starting such a task, in case failures corrupt the counts.