Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill, mail # dev - Query HDFS


+
Timothy Chen 2013-10-17, 23:10
+
Tom Seddon 2013-10-19, 15:20
+
Steven Phillips 2013-10-21, 21:10
Copy link to this message
-
Re: Query HDFS
Timothy Chen 2013-10-21, 21:12
So does Drill try to contact HDFS through localhost then?

I would imagine it needs to know the namenode location to start the HDFS
connection.

Tim
On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips <[EMAIL PROTECTED]>wrote:

> This is configured as part of the storage engine. For example, if you are
> submitting a physical plan directly, you would set the dfsName property to:
> hdfs://<namenode host:ip>/
>
> If submitting a sql query through sqlline, you should modify the
> storage-engines.json in the conf directory. For example, modify the
> "parquet" config to this:
>
> "parquet" :
>       {
>         "type":"parquet",
>         "dfsName" : "hdfs://<namenode host:ip>/"
>       }
>
>
> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[EMAIL PROTECTED]>
> wrote:
>
> > Hi,
> >
> > I'm also interested in querying data residing in HDFS.  Grateful for any
> > advice on how to achieve this.
> >
> > Thanks,
> >
> > Tom
> >
> >
> >
> > On 18 October 2013 00:10, Timothy Chen <[EMAIL PROTECTED]> wrote:
> >
> >> Hey Steven/Jacques,
> >>
> >> If I want to query data resides in HDFS, how do I query this in sqlline?
> >>
> >> And how do I specify which HDFS namenode it should connect to for data?
> >>
> >> Since I got Drill deployable to EC2, I'm currently thinking to hook the
> >> AMPLabs Benchmark dataset and see how we perform, and it needs to copy
> the
> >> dataset from s3 to a distributed file system first as one node won't
> able
> >> to contain it.
> >>
> >> Thanks!
> >>
> >> Tim
> >>
> >
> >
>
+
Timothy Chen 2013-10-21, 21:14
+
Steven Phillips 2013-10-21, 21:17