Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Drill, mail # dev - Re: Schema discovery


Copy link to this message
-
Re: Schema discovery
Dhruv 2013-11-03, 06:18
Hi,
     I think we might be able to reuse, Schema discovery features of
http://metamodel.incubator.apache.org/
     Although its seems incubator, but at
http://metamodel.eobjects.org/goals.html it was pretty mature
(version-ed till 3.4)
     One of the most distinguising feature is "Traversing and building
the *structure* of datastores"; which matches with our goal of "Late
schema" support.

-Dhruv
On 11/02/2013 11:21 AM, Timothy Chen wrote:
> Hi Julian,
>
> Glad to have someone responded to this :) Yes I think going beyond just
> having no schema defined up front to actually giving users possibilities is
> definitely a much better interactive experience.
>
> I would imagine though that it could impact Drill, or perhaps build more
> statistics capabilities in Drill to query schema info, since not all data
> is just raw files but could be living in different data stores, then I
> would think we need to talk through the Drill storage engine abstraction to
> get those info.
>
> I'll chat about this with Jacques and folks next monday or in the Drill
> user group.
>
> Tim
>
>
> On Fri, Nov 1, 2013 at 4:51 PM, Julian Hyde <[EMAIL PROTECTED]> wrote:
>
>> A recent blog post by Daniel Abadi has a similar theme:
>>
>>
>> http://hadapt.com/blog/2013/10/28/all-sql-on-hadoop-solutions-are-missing-the-point-of-hadoop/
>>
>> We could create a tool that scans the raw files and generates an Optiq
>> schema that contains views that apply "late schema" (the "EMP" and "DEPT"
>> views in
>> https://raw.github.com/apache/incubator-drill/HEAD/sqlparser/src/test/resources/test-models.jsonare examples of this). The user could interactively modify that schema
>> (e.g. change a column's type from string to boolean or integer).
>>
>> It's a nice approach because it doesn't impact the Drill engine. This is
>> good. Metadata and data should be kept separate wherever possible.
>>
>> Julian

+
Timothy Chen 2013-11-03, 22:21