Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Drill >> mail # dev >> Drill terminology issue: "schema"


Copy link to this message
-
Drill terminology issue: "schema"
We talked to Rahul about this last week, and we think that the term *schema*
is overloaded and misleading for several reasons:

   1. The meaning of the term, generally speaking, depends on the context.
   2. A *schema* usually refers to the type and structure of data in a data
   source, but Drill operates on data that is "schema-less" or
   "self-describing."
   3. We have commands that refer to schemas, workspaces, file systems, and
   databases interchangeably.
   4. A *schema* in the relational database world is a namespace within a
   single physical database. Drill does not use the term in this way.
Below are some specific examples where the use of the term *schema* does
not seem appropriate or accurate. It's a little bit tricky to explain
clearly. Please let me and Bridget know if you think we should try to
change the terminology, and we'll open a Jira.

1. Does it make sense to set a *workspace* or a *database* name with the
USE command and have it be called the "default schema"?

0: jdbc:drill:> use hive.`default`;

+------------+------------+

|     ok     |  summary   |

+------------+------------+

| true       | Default schema changed to 'hive.default' |

+------------+------------+

1 row selected (0.113 seconds)

2. The SHOW DATABASES command returns a list of "schemas" not databases?
0: jdbc:drill:> show databases;

+-------------+

| SCHEMA_NAME |

+-------------+

| hive.default |

| dfs.default |

....
3. The SHOW SCHEMAS command returns the same list, really a list of
workspaces and databases (workspaces for file systems, databases for Hive
and HBase):

0: jdbc:drill:> show schemas;

+-------------+

| SCHEMA_NAME |

+-------------+

| hive.default |

| dfs.default |

| dfs.root    |

| dfs.tmp     |

| sys         |

| MFS.default |

| MFS.json_click |

| lab.default |

| lab.root    |

| lab.views   |

| lab.clicks  |

| hbase       |

| INFORMATION_SCHEMA |

+-------------+

13 rows selected (0.114 seconds)
4. What is a "Drill schema" versus a "non-Drill schema"? This non-Drill
schema appears in the show schemas list.
0: jdbc:drill:> create table mobile as select * from
MFS.json_click.`mobile.json` limit 2;

+------------+------------+

|     ok     |  summary   |

+------------+------------+

| false      | Error: Current schema is not a Drill schema. Can't create
new relations (tables or views) in non-Drill schemas. |

+------------+------------+

1 row selected (0.086 seconds)

0: jdbc:drill:> show schemas;

+-------------+

| SCHEMA_NAME |

+-------------+

| hive.default |

| dfs.default |

| dfs.root    |

| dfs.tmp     |

| sys         |

| MFS.default |

| MFS.json_click |

....