Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # user >> Accumulo Direct Reader

Copy link to this message
Re: Accumulo Direct Reader
See InputFormatBase#setScanOffline.

Clone a table, take it offline and then use it as your map/reduce
input format.  This will preserve a consistent view of the underlying
files, without going through the tablet servers.


On Wed, Oct 17, 2012 at 9:46 AM, Denis <[EMAIL PROTECTED]> wrote:
>     Hi.
>     I am thinking about creating a Direct Reader for Accumulo.
>     A library which has API compatible with the Accumulo client but
> reads .rf-files directly from HDFS, bypassing tservers.
>     Motivation is:
>     1. To have a possibility to quickly read stalled data when the
> tserver is busy (with re-balancing, reading logs, etc) or just went
> down and its tablets are not redistributed yet.
>     2. If the table is read-only or can afford eventual consistency,
> many readers can work in parallel with no bottleneck of tserver. Also,
> the table's data becomes local on three (number of HDFS replicas)
> servers instead of one.
>     3. Distribution of data: analytics can download .rf-files (even to
> a laptop) and run their software locally.
>     Any suggestions ?
>     Thanks.