Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill >> mail # user >> Schema discovery


+
Timothy Chen 2013-10-24, 18:05
Copy link to this message
-
Re: Schema discovery
A recent blog post by Daniel Abadi has a similar theme:

http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/

We could create a tool that scans the raw files and generates an Optiq schema that contains views that apply "late schema" (the "EMP" and "DEPT" views in https://raw.github.com/apache/incubator-drill/HEAD/sqlparser/src/test/resources/test-models.json are examples of this). The user could interactively modify that schema (e.g. change a column's type from string to boolean or integer).

It's a nice approach because it doesn't impact the Drill engine. This is good. Metadata and data should be kept separate wherever possible.

Julian
+
Timothy Chen 2013-11-02, 05:51
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB