Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill, mail # dev - Query HDFS


Copy link to this message
-
Re: Query HDFS
Steven Phillips 2013-10-21, 21:10
This is configured as part of the storage engine. For example, if you are
submitting a physical plan directly, you would set the dfsName property to:
hdfs://<namenode host:ip>/

If submitting a sql query through sqlline, you should modify the
storage-engines.json in the conf directory. For example, modify the
"parquet" config to this:

"parquet" :
      {
        "type":"parquet",
        "dfsName" : "hdfs://<namenode host:ip>/"
      }
On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm also interested in querying data residing in HDFS.  Grateful for any
> advice on how to achieve this.
>
> Thanks,
>
> Tom
>
>
>
> On 18 October 2013 00:10, Timothy Chen <[EMAIL PROTECTED]> wrote:
>
>> Hey Steven/Jacques,
>>
>> If I want to query data resides in HDFS, how do I query this in sqlline?
>>
>> And how do I specify which HDFS namenode it should connect to for data?
>>
>> Since I got Drill deployable to EC2, I'm currently thinking to hook the
>> AMPLabs Benchmark dataset and see how we perform, and it needs to copy the
>> dataset from s3 to a distributed file system first as one node won't able
>> to contain it.
>>
>> Thanks!
>>
>> Tim
>>
>
>