Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # user >> Distributed Drill question


Copy link to this message
-
Re: Distributed Drill question
We're a bit lacking in docs, sorry about that.

Drill maintains the concept of host affinity for individual operations.  In
the case of scans, this is typically associated with the locality
information of the HDFS blocks or HBase region servers.  Drillbits are
designed to be run next to the storage processes and have awareness of this
information.

Does that answer your question?

Thanks,
Jacques
On Wed, Oct 30, 2013 at 2:43 AM, Tom Seddon <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I would like to know more about how Drill's parallel processing of queries
> relates, if at all, to the parallel nature of a data source such as
> Hadeoop.  Am I correct in thinking that if a Drill cluster is querying data
> from a Hadoop cluster, that the drillbits are unaware of where the data
> resides in HDFS, as their interaction is through the NameNode.  If this is
> the case, how does scaling Drill out help performance if it's always having
> to route through the NameNode?
>
> Sorry if this is a silly question.  I've tried to find the answer by
> reading the documentation and the mailing list, but I'm still not clear on
> it.
>
> Thanks,
>
> Tom
>