-Re: Accumulo Direct Reader
Keith Turner 2012-10-17, 15:13
On Wed, Oct 17, 2012 at 10:57 AM, Eric Newton <[EMAIL PROTECTED]> wrote:
> See InputFormatBase#setScanOffline.
This uses o.a.a.c.client.impl.OfflineScanner. OfflineScanner will
scan an offline table by going directly to the files. It does the
exact same thing the tablet server does when reading a tablets files.
I was thinking of making OfflineScanner available through Connector
somehow when adding setScanOffline to M/R code, but did not for some
reason. If there is interest we could revisit this.
> Clone a table, take it offline and then use it as your map/reduce
> input format. This will preserve a consistent view of the underlying
> files, without going through the tablet servers.
> On Wed, Oct 17, 2012 at 9:46 AM, Denis <[EMAIL PROTECTED]> wrote:
>> I am thinking about creating a Direct Reader for Accumulo.
>> A library which has API compatible with the Accumulo client but
>> reads .rf-files directly from HDFS, bypassing tservers.
>> Motivation is:
>> 1. To have a possibility to quickly read stalled data when the
>> tserver is busy (with re-balancing, reading logs, etc) or just went
>> down and its tablets are not redistributed yet.
>> 2. If the table is read-only or can afford eventual consistency,
>> many readers can work in parallel with no bottleneck of tserver. Also,
>> the table's data becomes local on three (number of HDFS replicas)
>> servers instead of one.
>> 3. Distribution of data: analytics can download .rf-files (even to
>> a laptop) and run their software locally.
>> Any suggestions ?