Clone a table, take it offline and then use it as your map/reduce
input format. This will preserve a consistent view of the underlying
files, without going through the tablet servers.
On Wed, Oct 17, 2012 at 9:46 AM, Denis <[EMAIL PROTECTED]> wrote:
> I am thinking about creating a Direct Reader for Accumulo.
> A library which has API compatible with the Accumulo client but
> reads .rf-files directly from HDFS, bypassing tservers.
> Motivation is:
> 1. To have a possibility to quickly read stalled data when the
> tserver is busy (with re-balancing, reading logs, etc) or just went
> down and its tablets are not redistributed yet.
> 2. If the table is read-only or can afford eventual consistency,
> many readers can work in parallel with no bottleneck of tserver. Also,
> the table's data becomes local on three (number of HDFS replicas)
> servers instead of one.
> 3. Distribution of data: analytics can download .rf-files (even to
> a laptop) and run their software locally.
> Any suggestions ?