Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Drill terminology issue: "schema"


Copy link to this message
-
Drill terminology issue: "schema"
We talked to Rahul about this last week, and we think that the term *schema*
is overloaded and misleading for several reasons:

   1. The meaning of the term, generally speaking, depends on the context.
   2. A *schema* usually refers to the type and structure of data in a data
   source, but Drill operates on data that is "schema-less" or
   "self-describing."
   3. We have commands that refer to schemas, workspaces, file systems, and
   databases interchangeably.
   4. A *schema* in the relational database world is a namespace within a
   single physical database. Drill does not use the term in this way.
Below are some specific examples where the use of the term *schema* does
not seem appropriate or accurate. It's a little bit tricky to explain
clearly. Please let me and Bridget know if you think we should try to
change the terminology, and we'll open a Jira.

1. Does it make sense to set a *workspace* or a *database* name with the
USE command and have it be called the "default schema"?

0: jdbc:drill:> use hive.`default`;

+------------+------------+

|     ok     |  summary   |

+------------+------------+

| true       | Default schema changed to 'hive.default' |

+------------+------------+

1 row selected (0.113 seconds)

2. The SHOW DATABASES command returns a list of "schemas" not databases?
0: jdbc:drill:> show databases;

+-------------+

| SCHEMA_NAME |

+-------------+

| hive.default |

| dfs.default |

....
3. The SHOW SCHEMAS command returns the same list, really a list of
workspaces and databases (workspaces for file systems, databases for Hive
and HBase):

0: jdbc:drill:> show schemas;

+-------------+

| SCHEMA_NAME |

+-------------+

| hive.default |

| dfs.default |

| dfs.root    |

| dfs.tmp     |

| sys         |

| MFS.default |

| MFS.json_click |

| lab.default |

| lab.root    |

| lab.views   |

| lab.clicks  |

| hbase       |

| INFORMATION_SCHEMA |

+-------------+

13 rows selected (0.114 seconds)
4. What is a "Drill schema" versus a "non-Drill schema"? This non-Drill
schema appears in the show schemas list.
0: jdbc:drill:> create table mobile as select * from
MFS.json_click.`mobile.json` limit 2;

+------------+------------+

|     ok     |  summary   |

+------------+------------+

| false      | Error: Current schema is not a Drill schema. Can't create
new relations (tables or views) in non-Drill schemas. |

+------------+------------+

1 row selected (0.086 seconds)

0: jdbc:drill:> show schemas;

+-------------+

| SCHEMA_NAME |

+-------------+

| hive.default |

| dfs.default |

| dfs.root    |

| dfs.tmp     |

| sys         |

| MFS.default |

| MFS.json_click |

....

 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB