Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Query HDFS


Copy link to this message
-
Re: Query HDFS
Steven Phillips 2013-10-21, 21:17
You might also try querying data in s3 as well, by using the s3 uri.
On Mon, Oct 21, 2013 at 2:14 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:

> Nvm, some reason I didn't catch the <namenode host ip> :)
>
> I'll try this out with the AMPlab data set.
>
> Tim
>
>
> On Mon, Oct 21, 2013 at 2:12 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:
>
> > So does Drill try to contact HDFS through localhost then?
> >
> > I would imagine it needs to know the namenode location to start the HDFS
> > connection.
> >
> > Tim
> >
> >
> > On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips <[EMAIL PROTECTED]
> >wrote:
> >
> >> This is configured as part of the storage engine. For example, if you
> are
> >> submitting a physical plan directly, you would set the dfsName property
> >> to:
> >> hdfs://<namenode host:ip>/
> >>
> >> If submitting a sql query through sqlline, you should modify the
> >> storage-engines.json in the conf directory. For example, modify the
> >> "parquet" config to this:
> >>
> >> "parquet" :
> >>       {
> >>         "type":"parquet",
> >>         "dfsName" : "hdfs://<namenode host:ip>/"
> >>       }
> >>
> >>
> >> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[EMAIL PROTECTED]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I'm also interested in querying data residing in HDFS.  Grateful for
> any
> >> > advice on how to achieve this.
> >> >
> >> > Thanks,
> >> >
> >> > Tom
> >> >
> >> >
> >> >
> >> > On 18 October 2013 00:10, Timothy Chen <[EMAIL PROTECTED]> wrote:
> >> >
> >> >> Hey Steven/Jacques,
> >> >>
> >> >> If I want to query data resides in HDFS, how do I query this in
> >> sqlline?
> >> >>
> >> >> And how do I specify which HDFS namenode it should connect to for
> data?
> >> >>
> >> >> Since I got Drill deployable to EC2, I'm currently thinking to hook
> the
> >> >> AMPLabs Benchmark dataset and see how we perform, and it needs to
> copy
> >> the
> >> >> dataset from s3 to a distributed file system first as one node won't
> >> able
> >> >> to contain it.
> >> >>
> >> >> Thanks!
> >> >>
> >> >> Tim
> >> >>
> >> >
> >> >
> >>
> >
> >
>