Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # user >> Accumulo Direct Reader


+
Denis 2012-10-17, 13:46
Copy link to this message
-
Re: Accumulo Direct Reader
See InputFormatBase#setScanOffline.

Clone a table, take it offline and then use it as your map/reduce
input format.  This will preserve a consistent view of the underlying
files, without going through the tablet servers.

-Eric

On Wed, Oct 17, 2012 at 9:46 AM, Denis <[EMAIL PROTECTED]> wrote:
>     Hi.
>
>     I am thinking about creating a Direct Reader for Accumulo.
>
>     A library which has API compatible with the Accumulo client but
> reads .rf-files directly from HDFS, bypassing tservers.
>
>     Motivation is:
>
>     1. To have a possibility to quickly read stalled data when the
> tserver is busy (with re-balancing, reading logs, etc) or just went
> down and its tablets are not redistributed yet.
>
>     2. If the table is read-only or can afford eventual consistency,
> many readers can work in parallel with no bottleneck of tserver. Also,
> the table's data becomes local on three (number of HDFS replicas)
> servers instead of one.
>
>     3. Distribution of data: analytics can download .rf-files (even to
> a laptop) and run their software locally.
>
>     Any suggestions ?
>
>     Thanks.
+
Keith Turner 2012-10-17, 15:13
+
Marc Parisi 2012-10-17, 14:03
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB