-Re: RS, TT, shared DN and good performance on random Hbase random reads.
Adrien Mogenet 2012-08-25, 17:19
How would you define a "low slotted" ?
A poor scheduling capacity to avoid high number of mappers ?
On Sat, Aug 25, 2012 at 3:32 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> Hi Marc,
> On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
>> The reasons for that would be:
>> -After running full compaction, HFiles end up in the RS nodes, so would
>> achieve data locality.
>> -As I have replication factor 3 and just 2 Hbase nodes, I know that no map
>> task would try to read in the RS nodes. The reduce tasks will write first in
>> the node where they exist (which will never be a RS node).
>> -So, in the RS I would end up having the Hbase tables and block replicas of
>> the MR jobs that will never be read (as Maps do data locality and at least a
>> replica of each block will be in a MR node)
> Just to keep in mind: All HBase read/write requests are made via the
> RS. The RS's held blocks of HDFS data isn't directly accessed by any
> client (RS is THE data server for HBase client).
>> In case this would work, if I add more nodes with RS and datanode, could I
>> guarantee that no map task would ever read in them? (assuming that a reduce
>> task always writes first in the node where it exists, correct me if I'm
>> wrong please as I'm not sure about this).
> Yes, you can guarantee this to a certain extent. In case data-locality
> is absent in some tasks (due to scheduling constraints), a few blocks
> may be read out by the RS-node's DNs, but shouldn't be a big impact
> given that a good scheduler in MR usually helps avoid having to do
> Alternatively you can also consider running low-slotted TTs to use up
> the RS machines but in a safer way.
> Harsh J