Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo Direct Reader


+
Denis 2012-10-17, 13:46
Copy link to this message
-
Re: Accumulo Direct Reader
See InputFormatBase#setScanOffline.

Clone a table, take it offline and then use it as your map/reduce
input format.  This will preserve a consistent view of the underlying
files, without going through the tablet servers.

-Eric

On Wed, Oct 17, 2012 at 9:46 AM, Denis <[EMAIL PROTECTED]> wrote:
>     Hi.
>
>     I am thinking about creating a Direct Reader for Accumulo.
>
>     A library which has API compatible with the Accumulo client but
> reads .rf-files directly from HDFS, bypassing tservers.
>
>     Motivation is:
>
>     1. To have a possibility to quickly read stalled data when the
> tserver is busy (with re-balancing, reading logs, etc) or just went
> down and its tablets are not redistributed yet.
>
>     2. If the table is read-only or can afford eventual consistency,
> many readers can work in parallel with no bottleneck of tserver. Also,
> the table's data becomes local on three (number of HDFS replicas)
> servers instead of one.
>
>     3. Distribution of data: analytics can download .rf-files (even to
> a laptop) and run their software locally.
>
>     Any suggestions ?
>
>     Thanks.
+
Keith Turner 2012-10-17, 15:13
+
Marc Parisi 2012-10-17, 14:03