Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Sqoop >> mail # dev >> Sqoop2 questions


Copy link to this message
-
Re: Sqoop2 questions
Thanks for your answers, Arvind!

All make sense to me.

Cheolsoo

On Mon, Jun 18, 2012 at 3:46 PM, Arvind Prabhakar <[EMAIL PROTECTED]> wrote:

> Hi Cheolsoo,
>
> Excellent questions. Please see my comments inline below:
>
> On Mon, Jun 18, 2012 at 2:55 PM, Cheolsoo Park <[EMAIL PROTECTED]
> >wrote:
>
> > Hi Sqoop developers,
> >
> > Thinking about the Sqoop2 client interface, I have a couple of questions
> > that I'd like to ask everyone.
> >
> > *1. CLI vs. Web UI*
> >
> > Currently, we have MForm type that represents a group of questions. For
> > example, an import job MForm consists of several MInput such as table
> name,
> > target dir, etc.
> >
> > This seems to make perfect sense for Web UI as we can present a group of
> > questions in a single form, but I don't think that this fits well with
> CLI.
> > For example, in Web UI, we may ask multiple questions in a single form as
> > follows:
> >
> > Table name: ___
> > Target dir: ___
> > Columns: ___
> >
> > But in CLI, we can't really do the same. Instead, we're forced to ask a
> > single question at a time:
> >
> > Table name? <enter>
> > Target dir? <enter>
> > Columns? <enter>
> >
> > Granted, we could simply iterate MInputs of a MForm in CLI, but I am
> > wondering if we can have a better logical representation that fits well
> > with both UIs.
> >
>
> Both the Web UI as well as the CLI will operate on one instance of MForm at
> a given time. This instance will be sent back to the server for validation
> and will come back with validation errors annotating the various MInputs
> within it. If everything is valid, the server will send the next MForm in
> the series. What the CLI/Web UI does with the MForm is upto the client
> implementation. For example, it could describe the inputs necessary upfront
> and then go into interactive mode for each one of them.
>
> One alternative to iterating over the inputs would be to use a library like
> cursors to do terminal manipulation. However, that would require native
> code and will have a significant maintenance cost associated with it.
>
>
> > *2. Dependency among options*
> >
> > Given one of design goals in Sqoop2 is easy to use, we should be able to
> > guide users through various options by asking relevant questions based on
> > their previous answers. For example, in an import job, we may ask
> different
> > questions depending whether or not it is a Hive import.
> >
> > Hive import?
> >  Yes --> Hive table name?
> >  No --> no further question
> >
> > To do this, I think that we need some sort of dependency graph to
> represent
> > options, and I think that a tree structure (thanks Bilung for your
> > suggestion) makes more sense than a list. (The current implementation of
> > MForm is a list.)
> >
> > If we decide to implement depdency graph, another related question is how
> > to collect dependency information from connector developers. A while
> ago, I
> > played a bit with the connector interface, and one of suggestions from
> > Arvind was use Java annotation to embed meta data such as label, max
> > length, etc. For example:
> >
> >  @ConnectionInput(name = "inp-conn-connectstring", maxChars = 128)
> >  protected String connectString = null;
> >
> >  @ConnectionInput(name = "inp-conn-username")
> >  protected String username = null;
> >
> >  @ConnectionInput(name = "inp-conn-password", hidden = true)
> >  protected String password = null;
> >
> > (You can find full implementation at my github:
> >
> >
> http://github.sf.cloudera.com/cheolsoo/apache-sqoop/commit/af8dde141c3ae1e0e70f178a241171d36421aec7
> > )
> >
> > Regardless whether we use annotation or another, it seems straightforward
> > to embed meta data for individual inputs in the connector interface. But
> it
> > is not clear to me how to embed meta data among inputs such as dependency
> > information. I am wondering if anyone has a good suggestion about how to
> > achieve this.
> >
>
> The dependency will be resolved dynamically by the server. For example, if
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB