Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS, TT, shared DN and good performance on random Hbase random reads.


Copy link to this message
-
Re: RS, TT, shared DN and good performance on random Hbase random reads.
Hi Marc,

On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
> The reasons for that would be:
> -After running full compaction, HFiles end up in the RS nodes, so would
> achieve data locality.
> -As I have replication factor 3 and just 2 Hbase nodes, I know that no map
> task would try to read in the RS nodes. The reduce tasks will write first in
> the node where they exist (which will never be a RS node).
> -So, in the RS I would end up having the Hbase tables and block replicas of
> the MR jobs that will never be read (as Maps do data locality and at least a
> replica of each block will be in a MR node)

Just to keep in mind: All HBase read/write requests are made via the
RS. The RS's held blocks of HDFS data isn't directly accessed by any
client (RS is THE data server for HBase client).

> In case this would work, if I add more nodes with RS and datanode, could I
> guarantee that no map task would ever read in them? (assuming that a reduce
> task always writes first in the node where it exists, correct me if I'm
> wrong please as I'm not sure about this).

Yes, you can guarantee this to a certain extent. In case data-locality
is absent in some tasks (due to scheduling constraints), a few blocks
may be read out by the RS-node's DNs, but shouldn't be a big impact
given that a good scheduler in MR usually helps avoid having to do
that.

Alternatively you can also consider running low-slotted TTs to use up
the RS machines but in a safer way.

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB