Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Sqoop2 questions


Copy link to this message
-
Re: Sqoop2 questions
Hi Cheolsoo,

Excellent questions. Please see my comments inline below:

On Mon, Jun 18, 2012 at 2:55 PM, Cheolsoo Park <[EMAIL PROTECTED]>wrote:

> Hi Sqoop developers,
>
> Thinking about the Sqoop2 client interface, I have a couple of questions
> that I'd like to ask everyone.
>
> *1. CLI vs. Web UI*
>
> Currently, we have MForm type that represents a group of questions. For
> example, an import job MForm consists of several MInput such as table name,
> target dir, etc.
>
> This seems to make perfect sense for Web UI as we can present a group of
> questions in a single form, but I don't think that this fits well with CLI.
> For example, in Web UI, we may ask multiple questions in a single form as
> follows:
>
> Table name: ___
> Target dir: ___
> Columns: ___
>
> But in CLI, we can't really do the same. Instead, we're forced to ask a
> single question at a time:
>
> Table name? <enter>
> Target dir? <enter>
> Columns? <enter>
>
> Granted, we could simply iterate MInputs of a MForm in CLI, but I am
> wondering if we can have a better logical representation that fits well
> with both UIs.
>

Both the Web UI as well as the CLI will operate on one instance of MForm at
a given time. This instance will be sent back to the server for validation
and will come back with validation errors annotating the various MInputs
within it. If everything is valid, the server will send the next MForm in
the series. What the CLI/Web UI does with the MForm is upto the client
implementation. For example, it could describe the inputs necessary upfront
and then go into interactive mode for each one of them.

One alternative to iterating over the inputs would be to use a library like
cursors to do terminal manipulation. However, that would require native
code and will have a significant maintenance cost associated with it.
> *2. Dependency among options*
>
> Given one of design goals in Sqoop2 is easy to use, we should be able to
> guide users through various options by asking relevant questions based on
> their previous answers. For example, in an import job, we may ask different
> questions depending whether or not it is a Hive import.
>
> Hive import?
>  Yes --> Hive table name?
>  No --> no further question
>
> To do this, I think that we need some sort of dependency graph to represent
> options, and I think that a tree structure (thanks Bilung for your
> suggestion) makes more sense than a list. (The current implementation of
> MForm is a list.)
>
> If we decide to implement depdency graph, another related question is how
> to collect dependency information from connector developers. A while ago, I
> played a bit with the connector interface, and one of suggestions from
> Arvind was use Java annotation to embed meta data such as label, max
> length, etc. For example:
>
>  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
>  protected String connectString = null;
>
>  @ConnectionInput(name = "inp-conn-username")
>  protected String username = null;
>
>  @ConnectionInput(name = "inp-conn-password", hidden = true)
>  protected String password = null;
>
> (You can find full implementation at my github:
>
> http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
> )
>
> Regardless whether we use annotation or another, it seems straightforward
> to embed meta data for individual inputs in the connector interface. But it
> is not clear to me how to embed meta data among inputs such as dependency
> information. I am wondering if anyone has a good suggestion about how to
> achieve this.
>

The dependency will be resolved dynamically by the server. For example, if
the user selects Hive import there would be a follow-up form that will try
to get details on the various options that the server requires to
successfully enable that.

Regards,
Arvind Prabhakar

>
> Thoughts?
>
> Thanks,
> Cheolsoo
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB