Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HBase >> mail # user >> RS, TT, shared DN and good performance on random Hbase random reads.


Copy link to this message
-
Re: RS, TT, shared DN and good performance on random Hbase random reads.
Yes. What I meant was a low number of slots on these TTs alone (those
that are co-located with RS, if you want to do that) by having a
limited maximum of map and reduce slots configured on it specially. Or
if you use MR2 over YARN, you must limit the NodeManager's maximum
memory usage.

On Sat, Aug 25, 2012 at 10:49 PM, Adrien Mogenet
<[EMAIL PROTECTED]> wrote:
> How would you define a "low slotted" ?
> A poor scheduling capacity to avoid high number of mappers ?
>
> On Sat, Aug 25, 2012 at 3:32 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> Hi Marc,
>>
>> On Sat, Aug 25, 2012 at 12:56 AM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
>>> The reasons for that would be:
>>> -After running full compaction, HFiles end up in the RS nodes, so would
>>> achieve data locality.
>>> -As I have replication factor 3 and just 2 Hbase nodes, I know that no map
>>> task would try to read in the RS nodes. The reduce tasks will write first in
>>> the node where they exist (which will never be a RS node).
>>> -So, in the RS I would end up having the Hbase tables and block replicas of
>>> the MR jobs that will never be read (as Maps do data locality and at least a
>>> replica of each block will be in a MR node)
>>
>> Just to keep in mind: All HBase read/write requests are made via the
>> RS. The RS's held blocks of HDFS data isn't directly accessed by any
>> client (RS is THE data server for HBase client).
>>
>>> In case this would work, if I add more nodes with RS and datanode, could I
>>> guarantee that no map task would ever read in them? (assuming that a reduce
>>> task always writes first in the node where it exists, correct me if I'm
>>> wrong please as I'm not sure about this).
>>
>> Yes, you can guarantee this to a certain extent. In case data-locality
>> is absent in some tasks (due to scheduling constraints), a few blocks
>> may be read out by the RS-node's DNs, but shouldn't be a big impact
>> given that a good scheduler in MR usually helps avoid having to do
>> that.
>>
>> Alternatively you can also consider running low-slotted TTs to use up
>> the RS machines but in a safer way.
>>
>> --
>> Harsh J
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me

--
Harsh J
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB