Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Sqoop2 questions


Copy link to this message
-
Re: Sqoop2 questions
Thanks for your answers, Arvind!

All make sense to me.

Cheolsoo

On Mon, Jun 18, 2012 at 3:46 PM, Arvind Prabhakar <[EMAIL PROTECTED]> wrote:

> Hi Cheolsoo,
>
> Excellent questions. Please see my comments inline below:
>
> On Mon, Jun 18, 2012 at 2:55 PM, Cheolsoo Park <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Sqoop developers,
> >
> > Thinking about the Sqoop2 client interface, I have a couple of questions
> > that I'd like to ask everyone.
> >
> > *1. CLI vs. Web UI*
> >
> > Currently, we have MForm type that represents a group of questions. For
> > example, an import job MForm consists of several MInput such as table
> name,
> > target dir, etc.
> >
> > This seems to make perfect sense for Web UI as we can present a group of
> > questions in a single form, but I don't think that this fits well with
> CLI.
> > For example, in Web UI, we may ask multiple questions in a single form as
> > follows:
> >
> > Table name: ___
> > Target dir: ___
> > Columns: ___
> >
> > But in CLI, we can't really do the same. Instead, we're forced to ask a
> > single question at a time:
> >
> > Table name? <enter>
> > Target dir? <enter>
> > Columns? <enter>
> >
> > Granted, we could simply iterate MInputs of a MForm in CLI, but I am
> > wondering if we can have a better logical representation that fits well
> > with both UIs.
> >
>
> Both the Web UI as well as the CLI will operate on one instance of MForm at
> a given time. This instance will be sent back to the server for validation
> and will come back with validation errors annotating the various MInputs
> within it. If everything is valid, the server will send the next MForm in
> the series. What the CLI/Web UI does with the MForm is upto the client
> implementation. For example, it could describe the inputs necessary upfront
> and then go into interactive mode for each one of them.
>
> One alternative to iterating over the inputs would be to use a library like
> cursors to do terminal manipulation. However, that would require native
> code and will have a significant maintenance cost associated with it.
>
>
> > *2. Dependency among options*
> >
> > Given one of design goals in Sqoop2 is easy to use, we should be able to
> > guide users through various options by asking relevant questions based on
> > their previous answers. For example, in an import job, we may ask
> different
> > questions depending whether or not it is a Hive import.
> >
> > Hive import?
> >  Yes --> Hive table name?
> >  No --> no further question
> >
> > To do this, I think that we need some sort of dependency graph to
> represent
> > options, and I think that a tree structure (thanks Bilung for your
> > suggestion) makes more sense than a list. (The current implementation of
> > MForm is a list.)
> >
> > If we decide to implement depdency graph, another related question is how
> > to collect dependency information from connector developers. A while
> ago, I
> > played a bit with the connector interface, and one of suggestions from
> > Arvind was use Java annotation to embed meta data such as label, max
> > length, etc. For example:
> >
> >  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
> >  protected String connectString = null;
> >
> >  @ConnectionInput(name = "inp-conn-username")
> >  protected String username = null;
> >
> >  @ConnectionInput(name = "inp-conn-password", hidden = true)
> >  protected String password = null;
> >
> > (You can find full implementation at my github:
> >
> >
> http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
> > )
> >
> > Regardless whether we use annotation or another, it seems straightforward
> > to embed meta data for individual inputs in the connector interface. But
> it
> > is not clear to me how to embed meta data among inputs such as dependency
> > information. I am wondering if anyone has a good suggestion about how to
> > achieve this.
> >
>
> The dependency will be resolved dynamically by the server. For example, if