Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo, mail # dev - Review Request 19790: ACCUMULO-378 Design document


Copy link to this message
-
Re: Review Request 19790: ACCUMULO-378 Design document
Josh Elser 2014-03-31, 16:40

No, the intent was to support replication from one cluster to N clusters. We could make this detail transparent by including the destination in the table that we store references data to be replicated at the cost of storing N*M records instead of just M records. N is the number of clusters the source is replicating to while M is the number of references to data that needs to be replicated. The more I think about it, the more I think it's definitely worth it.

The biggest issue is for using them is that they drastically reduce the latency for data to *begin* the replication process. We certainly could use RFiles for everything which would simplify things, but I'm worried about the latency that would incur. If we used RFiles, the only solution I can come up with to speed up that latency before replication even begins would be to increase the minc's frequency. Maybe that's sufficient for a first-pass? I think I need to quantify this opinions with some numbers.

Right now, we tend to recommend a bigger in-memory map for increased ingest performance. The worry here would be that recommendation now comes with increased replication latency.
- Josh
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19790/#review39051
On March 28, 2014, 5:54 p.m., kturner wrote: