Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Query HDFS


This is configured as part of the storage engine. For example, if you are
submitting a physical plan directly, you would set the dfsName property to:
hdfs://<namenode host:ip>/

If submitting a sql query through sqlline, you should modify the
storage-engines.json in the conf directory. For example, modify the
"parquet" config to this:

"parquet" :
      {
        "type":"parquet",
        "dfsName" : "hdfs://<namenode host:ip>/"
      }
On Sat, Oct 19, 2013 at 8:20 AM, Tom Seddon <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I'm also interested in querying data residing in HDFS.  Grateful for any
> advice on how to achieve this.
>
> Thanks,
>
> Tom
>
>
>
> On 18 October 2013 00:10, Timothy Chen <[EMAIL PROTECTED]> wrote:
>
>> Hey Steven/Jacques,
>>
>> If I want to query data resides in HDFS, how do I query this in sqlline?
>>
>> And how do I specify which HDFS namenode it should connect to for data?
>>
>> Since I got Drill deployable to EC2, I'm currently thinking to hook the
>> AMPLabs Benchmark dataset and see how we perform, and it needs to copy the
>> dataset from s3 to a distributed file system first as one node won't able
>> to contain it.
>>
>> Thanks!
>>
>> Tim
>>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB