Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Sqoop2 questions


Copy link to this message
-
Sqoop2 questions
Hi Sqoop developers,

Thinking about the Sqoop2 client interface, I have a couple of questions
that I'd like to ask everyone.

*1. CLI vs. Web UI*

Currently, we have MForm type that represents a group of questions. For
example, an import job MForm consists of several MInput such as table name,
target dir, etc.

This seems to make perfect sense for Web UI as we can present a group of
questions in a single form, but I don't think that this fits well with CLI.
For example, in Web UI, we may ask multiple questions in a single form as
follows:

Table name: ___
Target dir: ___
Columns: ___

But in CLI, we can't really do the same. Instead, we're forced to ask a
single question at a time:

Table name? <enter>
Target dir? <enter>
Columns? <enter>

Granted, we could simply iterate MInputs of a MForm in CLI, but I am
wondering if we can have a better logical representation that fits well
with both UIs.

*2. Dependency among options*

Given one of design goals in Sqoop2 is easy to use, we should be able to
guide users through various options by asking relevant questions based on
their previous answers. For example, in an import job, we may ask different
questions depending whether or not it is a Hive import.

Hive import?
  Yes --> Hive table name?
  No --> no further question

To do this, I think that we need some sort of dependency graph to represent
options, and I think that a tree structure (thanks Bilung for your
suggestion) makes more sense than a list. (The current implementation of
MForm is a list.)

If we decide to implement depdency graph, another related question is how
to collect dependency information from connector developers. A while ago, I
played a bit with the connector interface, and one of suggestions from
Arvind was use Java annotation to embed meta data such as label, max
length, etc. For example:

  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
  protected String connectString = null;

  @ConnectionInput(name = "inp-conn-username")
  protected String username = null;

  @ConnectionInput(name = "inp-conn-password", hidden = true)
  protected String password = null;

(You can find full implementation at my github:
http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
)

Regardless whether we use annotation or another, it seems straightforward
to embed meta data for individual inputs in the connector interface. But it
is not clear to me how to embed meta data among inputs such as dependency
information. I am wondering if anyone has a good suggestion about how to
achieve this.

Thoughts?

Thanks,
Cheolsoo