Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Accumulo >> mail # dev >> Is Data Locality Helpful? (or why run tserver and datanode on the same box?)


Copy link to this message
-
Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)
AFAIK, the locality may not be guaranteed right away unless the data for a
tablet was first ingested on the tablet server that is responsible for that
tablet, otherwise you'll need to wait for a major compaction to rewrite the
RFiles locally on the tablet server. I would assume if the tablet server is
not on the same node as the datanode, those files will probably be spread
across the cluster as if you were ingesting data from outside the cloud.

A recent discussion with Bill Slacum also brought to light a possible
problem of the HDFS balancer [1] re-balancing blocks after the fact which
could eventually pull blocks onto datanodes that are not local to the
tablets. I believe remedy for this was to turn off the balancer or not have
it run.

[1]
http://www.swiss-scalability.com/2013/08/hadoop-hdfs-balancer-explained.html
On Thu, Jun 19, 2014 at 10:07 AM, David Medinets <[EMAIL PROTECTED]>
wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB