Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Query HDFS


Copy link to this message
-
Re: Query HDFS
Nvm, some reason I didn't catch the <namenode host ip> :)

I'll try this out with the AMPlab data set.

Tim
On Mon, Oct 21, 2013 at 2:12 PM, Timothy Chen <[EMAIL PROTECTED]> wrote:

> So does Drill try to contact HDFS through localhost then?
>
> I would imagine it needs to know the namenode location to start the HDFS
> connection.
>
> Tim
>
>
> On Mon, Oct 21, 2013 at 2:10 PM, Steven Phillips <[EMAIL PROTECTED]>wrote:
>
>> This is configured as part of the storage engine. For example, if you are
>> submitting a physical plan directly, you would set the dfsName property
>> to:
>> hdfs://<namenode host:ip>/
>>
>> If submitting a sql query through sqlline, you should modify the
>> storage-engines.json in the conf directory. For example, modify the
>> "parquet" config to this:
>>
>> "parquet" :
>>       {
>>         "type":"parquet",
>>         "dfsName" : "hdfs://<namenode host:ip>/"
>>       }
>>
>>
>> On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[EMAIL PROTECTED]>
>> wrote:
>>
>> > Hi,
>> >
>> > I'm also interested in querying data residing in HDFS.  Grateful for any
>> > advice on how to achieve this.
>> >
>> > Thanks,
>> >
>> > Tom
>> >
>> >
>> >
>> > On 18 October 2013 00:10, Timothy Chen <[EMAIL PROTECTED]> wrote:
>> >
>> >> Hey Steven/Jacques,
>> >>
>> >> If I want to query data resides in HDFS, how do I query this in
>> sqlline?
>> >>
>> >> And how do I specify which HDFS namenode it should connect to for data?
>> >>
>> >> Since I got Drill deployable to EC2, I'm currently thinking to hook the
>> >> AMPLabs Benchmark dataset and see how we perform, and it needs to copy
>> the
>> >> dataset from s3 to a distributed file system first as one node won't
>> able
>> >> to contain it.
>> >>
>> >> Thanks!
>> >>
>> >> Tim
>> >>
>> >
>> >
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB