Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # user - Accumulo Direct Reader


+
Denis 2012-10-17, 13:46
+
Eric Newton 2012-10-17, 14:57
Copy link to this message
-
Re: Accumulo Direct Reader
Keith Turner 2012-10-17, 15:13
On Wed, Oct 17, 2012 at 10:57 AM, Eric Newton <[EMAIL PROTECTED]> wrote:
> See InputFormatBase#setScanOffline.

This uses o.a.a.c.client.impl.OfflineScanner.  OfflineScanner will
scan an offline table by going directly to the files.  It does the
exact same thing the tablet server does when reading a tablets files.
 I was thinking of making OfflineScanner available through Connector
somehow when adding setScanOffline to M/R code, but did not for some
reason.  If there is interest we could revisit this.

>
> Clone a table, take it offline and then use it as your map/reduce
> input format.  This will preserve a consistent view of the underlying
> files, without going through the tablet servers.
>
> -Eric
>
> On Wed, Oct 17, 2012 at 9:46 AM, Denis <[EMAIL PROTECTED]> wrote:
>>     Hi.
>>
>>     I am thinking about creating a Direct Reader for Accumulo.
>>
>>     A library which has API compatible with the Accumulo client but
>> reads .rf-files directly from HDFS, bypassing tservers.
>>
>>     Motivation is:
>>
>>     1. To have a possibility to quickly read stalled data when the
>> tserver is busy (with re-balancing, reading logs, etc) or just went
>> down and its tablets are not redistributed yet.
>>
>>     2. If the table is read-only or can afford eventual consistency,
>> many readers can work in parallel with no bottleneck of tserver. Also,
>> the table's data becomes local on three (number of HDFS replicas)
>> servers instead of one.
>>
>>     3. Distribution of data: analytics can download .rf-files (even to
>> a laptop) and run their software locally.
>>
>>     Any suggestions ?
>>
>>     Thanks.
+
Marc Parisi 2012-10-17, 14:03