-Re: RS, TT, shared DN and good performance on random Hbase random reads.
Harsh J 2012-08-25, 13:32
On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
> The reasons for that would be:
> -After running full compaction, HFiles end up in the RS nodes, so would
> achieve data locality.
> -As I have replication factor 3 and just 2 Hbase nodes, I know that no map
> task would try to read in the RS nodes. The reduce tasks will write first in
> the node where they exist (which will never be a RS node).
> -So, in the RS I would end up having the Hbase tables and block replicas of
> the MR jobs that will never be read (as Maps do data locality and at least a
> replica of each block will be in a MR node)
Just to keep in mind: All HBase read/write requests are made via the
RS. The RS's held blocks of HDFS data isn't directly accessed by any
client (RS is THE data server for HBase client).
> In case this would work, if I add more nodes with RS and datanode, could I
> guarantee that no map task would ever read in them? (assuming that a reduce
> task always writes first in the node where it exists, correct me if I'm
> wrong please as I'm not sure about this).
Yes, you can guarantee this to a certain extent. In case data-locality
is absent in some tasks (due to scheduling constraints), a few blocks
may be read out by the RS-node's DNs, but shouldn't be a big impact
given that a good scheduler in MR usually helps avoid having to do
Alternatively you can also consider running low-slotted TTs to use up
the RS machines but in a safer way.