Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Local block placement policy, request


Copy link to this message
-
Re: Local block placement policy, request
> Keep in mind there's a fair bit of subtlety to it -- eg what happens
> if you have two racks: A with 2 replicas, and B with one replica. A
> node in rack A requests a local replica. In this case we have to make
> sure that we move one of the A replicas and not the B replica (ie we
> must respect the NN's rack replication policy).

Yes, good point.  Also, I wonder how HDFS handles what will be over
replication of the file (meaning will it try to delete the over
replicated blocks, in which case we'd need to ensure [somehow] this
doesn't happen).

On Thu, May 26, 2011 at 12:30 PM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
> On Thu, May 26, 2011 at 12:02 PM, Jason Rutherglen
> <[EMAIL PROTECTED]> wrote:
>> Todd, thanks!
>>
>>> In general, though, keep in mind that, whenever you write data, you'll
>>> get a local copy first, if the writer is in the cluster. That's how
>>> HBase gets locality for most of its accesses
>>
>> Right.  However in the failover scenario where a node goes down
>> (hardware failure, or either of the processes, such as the DataNode,
>> RegionServer, etc), then I think the new RS will not have local data?
>> We could first make a request that all necessary HDFS files go local
>> prior to the new RS being available.  At least for search to work this
>> is a requirement.
>
> Yep, we've thrown this idea around before in the past, but not sure if
> there's an HBASE JIRA for it or not.
>
>>
>>> There are some non-public APIs to do this -- have a look at how the
>>> Balancer works - the dispatch() function is the guts you're looking
>>> for. It might be nice to expose this functionality as a "limited
>>> private evolving" API
>>
>> Perhaps simply mark them as 'expert' or make them package private?
>> I'll work on a patch.
>
> Sounds good.
>
> Keep in mind there's a fair bit of subtlety to it -- eg what happens
> if you have two racks: A with 2 replicas, and B with one replica. A
> node in rack A requests a local replica. In this case we have to make
> sure that we move one of the A replicas and not the B replica (ie we
> must respect the NN's rack replication policy).
>
> -Todd
>
>> On Thu, May 26, 2011 at 11:40 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:
>>> Hey Jason,
>>>
>>> There are some non-public APIs to do this -- have a look at how the
>>> Balancer works - the dispatch() function is the guts you're looking
>>> for. It might be nice to expose this functionality as a "limited
>>> private evolving" API.
>>>
>>> In general, though, keep in mind that, whenever you write data, you'll
>>> get a local copy first, if the writer is in the cluster. That's how
>>> HBase gets locality for most of its accesses.
>>>
>>> -Todd
>>>
>>> On Thu, May 26, 2011 at 11:36 AM, Jason Rutherglen
>>> <[EMAIL PROTECTED]> wrote:
>>>> Is there a way to send a request to the name node to replicate
>>>> block(s) to a specific DataNode?  If not, what would be a way to do
>>>> this?  -Thanks
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB